From bms at FreeBSD.org Tue Jul 1 09:14:48 2008 From: bms at FreeBSD.org (Bruce M. Simpson) Date: Tue Jul 1 09:15:02 2008 Subject: HEAD UP: non-MPSAFE network drivers to be disabled (was: 8.0 network stack MPsafety goals (fwd)) In-Reply-To: <20080629180126.F90836@fledge.watson.org> References: <20080524111715.T64552@fledge.watson.org> <20080629180126.F90836@fledge.watson.org> Message-ID: <4869F586.7010708@FreeBSD.org> Robert Watson wrote: > > An FYI on the state of things here: in the last month, John has > updated a number of device drivers to be MPSAFE, and the USB work > remains in-flight. I'm holding fire a bit on disabling IFF_NEEDSGIANT > while things settle and I catch up on driver state, and will likely > send out an update next week regarding which device drivers remain on > the kill list, and generally what the status of this project is. Goliath needs to get stoned, it's been a major hurdle in doing IGMPv3/SSM because of the locking fandango. I look forward to it. [For those who ask, what the hell? IGMPv3 potentially makes your wireless multicast better with or without little things like SSM, because of protocol robustness, compact state-changes, and the use of a single link-local IPv4 group for state-change reports, making it easier for your switches to actually do their job.] From ed at 80386.nl Wed Jul 2 19:09:03 2008 From: ed at 80386.nl (Ed Schouten) Date: Wed Jul 2 19:09:09 2008 Subject: MPSAFE TTY schedule Message-ID: <20080702190901.GS14567@hoeg.nl> Hello everyone, About 2 weeks ago I sent a message to these lists about the new MPSAFE TTY layer I've been working on. In my opinion, it is quite robust and after some minor polishing it should be a great candidate for inclusion in FreeBSD -CURRENT (never to be MFC'd). The last 2 weeks I've ported all the console drivers and fixed some bugs after I received some feedback from testers. Below is a list of pros and cons of the new TTY layer: + The new TTY layer allows drivers to operate without the Giant lock. Drivers like uart(4), the console drivers and the pseudo-terminal driver already support fine grained locking. + The new pseudo-terminal driver is capable of garbage collecting unused PTY's. Because PTY's are never recycled, they are a lot more robust (they are always initialized the same, no need to revoke() them before usage, etc). + The new TTY layer includes a new buffer scheme called the ttyinq and ttyoutq. Unlike clists, they provide unbuffered copying of data back to userspace, making read() calls in raw mode more efficient. + The programming interface for kernel drivers is a lot more simple. There is no need to have any knowledge about the internal state of the TTY layer. - Not all drivers have been ported to the new TTY layer yet. These drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), uftdi(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4), umodem(4), dcons(4). Even though drivers are very important to have, I am convinced we can get these working not long after the code as been integrated. I have considered adding two TTY layers to the FreeBSD kernel, but looking at the drivers, I'm not sure this is worth it, it turns certain parts of the kernel into rubbish. If you really care about one of these drivers, please port it to the new TTY layer as soon as possible! Below is the schedule I'm proposing for the integration of the MPSAFE TTY layer into our kernel. I would really appreciate it if I could get this code in before the end of the summer break, because I've got heaps of spare time to fix any problems then. July 2 2008 (today): Send a schedule to the lists about the integration of the MPSAFE TTY layer. July 13 2008: Make uart(4) the default serial port driver, instead of sio(4). sio(4) has not been ported to the new TTY layer and is very hard to do so. uart(4) has been proven to be more portable than sio(4) and already supports the hardware we need. July 20 2008: Send another heads-up to the lists about the new TTY layer. Kindly ask people to test the patchset, port more drivers, etc. August 3 2008: Disconnect drivers from the build that haven't been patched in the MPSAFE TTY branch. August 8 2008: Send the last heads-up to the lists, to warn people about the big commit. August 10 2008: Commit the new MPSAFE TTY driver in several commits (first commit the layer itself, then commit changes to drivers one by one). I'll stay close to my inbox the next couple of days after I've integrated the code, to make sure I can fix any bugs which may be spotted to be fixed shortly. Please, make sure we can make this a smooth transition by testing/reviewing my code. I tend to generate diffs very often. They can be downloaded here: http://www.il.fontys.nl/~ed/projects/mpsafetty/patches/ Thanks! -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080702/473f2ca0/attachment.pgp From nyan at jp.FreeBSD.org Thu Jul 3 13:49:35 2008 From: nyan at jp.FreeBSD.org (Takahashi Yoshihiro) Date: Thu Jul 3 13:49:41 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080702190901.GS14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> Message-ID: <20080703.223453.104068461.nyan@jp.FreeBSD.org> In article <20080702190901.GS14567@hoeg.nl> Ed Schouten writes: > July 13 2008: > Make uart(4) the default serial port driver, instead of sio(4). > sio(4) has not been ported to the new TTY layer and is very hard > to do so. uart(4) has been proven to be more portable than > sio(4) and already supports the hardware we need. uart(4) does not support PC98 CBus and i8251 devices at all. So I wish that sio(4) is remained. --- TAKAHASHI Yoshihiro From ed at 80386.nl Thu Jul 3 20:52:21 2008 From: ed at 80386.nl (Ed Schouten) Date: Thu Jul 3 20:52:24 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080703193406.GS29380@server.vk2pj.dyndns.org> References: <20080702190901.GS14567@hoeg.nl> <20080703193406.GS29380@server.vk2pj.dyndns.org> Message-ID: <20080703205220.GW14567@hoeg.nl> Hello Peter, * Peter Jeremy wrote: > On 2008-Jul-02 21:09:01 +0200, Ed Schouten wrote: > >+ The new pseudo-terminal driver is capable of garbage collecting unused > > PTY's. Because PTY's are never recycled, they are a lot more robust > > (they are always initialized the same, no need to revoke() them before > > usage, etc). > > When you say 'never recycled', does this include the PTY number? If > so, long running busy systems are going to get some fairly large > numbers. When will the PTY number wrap? What is the impact on tools > (eg ps, w) that assume they can represent a PTY in a small number of > digits? What about utmp(5) which uses PTY number in the index? PTY's are deallocated when unused, which means the PTY number is reused when possible. We still enfore the 1000 PTY limit, because utmp(5) only supports line names of eight bytes long ("pts/999\0"). > >- Not all drivers have been ported to the new TTY layer yet. These > > drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), > > uftdi(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4), > > umodem(4), dcons(4). > > > >Even though drivers are very important to have, I am convinced we can > >get these working not long after the code as been integrated. ... > > If you really care about one of these drivers, > >please port it to the new TTY layer as soon as possible! > > IMHO, this is not a reasonable approach: "Hi everyone. I'm going to > break infrastructure that a whole bunch of drivers depend on. If you > don't fix your drivers in the next few weeks then I'll disconnect > them". Either you need to provide compatibility shims (possibly > temporary and not MPSAFE) or you need to be far more pro-active in > assisting with porting existing consumers of the TTY layer. Well, even though I'd rather let other people assess me, I think I've been very proactive so far. As you can see, I sent my email to the list two days ago. In those two days I've fixed both umodem(4) and uftdi(4) to work with the new TTY layer again. > >TTY layer into our kernel. I would really appreciate it if I could get > >this code in before the end of the summer break, because I've got heaps > >of spare time to fix any problems then. > > That's all very nice but what about the maintainers of all the other > drivers that you are impacting? > > > sio(4) has not been ported to the new TTY layer and is very hard > > to do so. > > This is the only mention of how much effort is involved in porting a > driver to use the MPSAFE TTY layer and "very hard" is not a good start. > I can't quickly find any documentation on how to go about porting an > existing driver - definitely there are no section 9 man pages describing > the new API in your patchset. Well, sio(4) isn't impossible to port to the new TTY layer, but the first thing I noticed when I was hacking on the TTY layer during my internship, was that the uart(4) code was so easy to read, I only had to alter a single 396 line C file containing all the TTY interaction, while the sio(4) was somewhat (tenfold) more complex. But I just got told sio(4) is required for pc98, because uart(4) is not supported there. This means I'll seriously consider porting sio(4) one of these days. It's no biggie, even though I think someone could better take the effort to extend uart(4). The same with dcons(4). Even though I don't have any hardware to test it personally, I'll try to make sure we'll get it working in time. > IMHO, if you can't commit fixed drivers along with the MPSAFE TTY > layer, a more reasonable schedule is to replace the existing TTY layer > with an MPSAFE TTY layer that includes compatibility shims. If the > shims make things non-MPSAFE (which is likely) then warn that they > will be going away in (say) six months. This gives developers a more > reasonable timeframe in which to update, as well as working drivers > whilst they adapt them. So let us take a look at the list again: > sio(4), cy(4), digi(4), ubser(4), uftdi(4), nmdm(4), ng_h4(4), > ng_tty(4), snp(4), rp(4), rc(4), si(4), umodem(4), dcons(4). Removing the drivers which have been fixed, or will be fixed in time: > cy(4), digi(4), ubser(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), > rc(4), si(4). After I've committed the new TTY layer, I'm going to extend its design, so we can have hooks again, similar to the old line discipline idea. This has already been discussed. I'm also planning on reviving drivers like nmdm(4) and snp(4) by then. This means we've only got these drivers left: > cy(4), digi(4), rp(4), rc(4), si(4). Who actually owns one of these devices? If you do, please contact me. If I didn't make myself clear enough: I *am* willing to (assist in porting|port) these drivers. -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080703/78d41146/attachment.pgp From sam at freebsd.org Thu Jul 3 21:42:00 2008 From: sam at freebsd.org (Sam Leffler) Date: Thu Jul 3 21:42:04 2008 Subject: MPSAFE TTY schedule [uart vs sio] In-Reply-To: <20080703205220.GW14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> <20080703193406.GS29380@server.vk2pj.dyndns.org> <20080703205220.GW14567@hoeg.nl> Message-ID: <486D4006.2050303@freebsd.org> Ed Schouten wrote: > Hello Peter, > > * Peter Jeremy wrote: > >> On 2008-Jul-02 21:09:01 +0200, Ed Schouten wrote: >> >>> + The new pseudo-terminal driver is capable of garbage collecting unused >>> PTY's. Because PTY's are never recycled, they are a lot more robust >>> (they are always initialized the same, no need to revoke() them before >>> usage, etc). >>> >> When you say 'never recycled', does this include the PTY number? If >> so, long running busy systems are going to get some fairly large >> numbers. When will the PTY number wrap? What is the impact on tools >> (eg ps, w) that assume they can represent a PTY in a small number of >> digits? What about utmp(5) which uses PTY number in the index? >> > > PTY's are deallocated when unused, which means the PTY number is reused > when possible. We still enfore the 1000 PTY limit, because utmp(5) only > supports line names of eight bytes long ("pts/999\0"). > > >>> - Not all drivers have been ported to the new TTY layer yet. These >>> drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), >>> uftdi(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4), >>> umodem(4), dcons(4). >>> >>> Even though drivers are very important to have, I am convinced we can >>> get these working not long after the code as been integrated. ... >>> If you really care about one of these drivers, >>> please port it to the new TTY layer as soon as possible! >>> >> IMHO, this is not a reasonable approach: "Hi everyone. I'm going to >> break infrastructure that a whole bunch of drivers depend on. If you >> don't fix your drivers in the next few weeks then I'll disconnect >> them". Either you need to provide compatibility shims (possibly >> temporary and not MPSAFE) or you need to be far more pro-active in >> assisting with porting existing consumers of the TTY layer. >> > > Well, even though I'd rather let other people assess me, I think I've > been very proactive so far. As you can see, I sent my email to the list > two days ago. In those two days I've fixed both umodem(4) and uftdi(4) > to work with the new TTY layer again. > > >>> TTY layer into our kernel. I would really appreciate it if I could get >>> this code in before the end of the summer break, because I've got heaps >>> of spare time to fix any problems then. >>> >> That's all very nice but what about the maintainers of all the other >> drivers that you are impacting? >> >> >>> sio(4) has not been ported to the new TTY layer and is very hard >>> to do so. >>> >> This is the only mention of how much effort is involved in porting a >> driver to use the MPSAFE TTY layer and "very hard" is not a good start. >> I can't quickly find any documentation on how to go about porting an >> existing driver - definitely there are no section 9 man pages describing >> the new API in your patchset. >> > > Well, sio(4) isn't impossible to port to the new TTY layer, but the > first thing I noticed when I was hacking on the TTY layer during my > internship, was that the uart(4) code was so easy to read, I only had to > alter a single 396 line C file containing all the TTY interaction, while > the sio(4) was somewhat (tenfold) more complex. > > But I just got told sio(4) is required for pc98, because uart(4) is not > supported there. This means I'll seriously consider porting sio(4) one > of these days. It's no biggie, even though I think someone could better > take the effort to extend uart(4). > I would suggest first investigating how difficult it is to port uart to pc98. Given that we're broadening our platform support having a single serial driver seems preferable. > The same with dcons(4). Even though I don't have any hardware to test it > personally, I'll try to make sure we'll get it working in time. > I believe dcons is important and there are many people that can pitch in on this driver so I'm less worried about it. FWIW I'm 100% behind you on moving this stuff forward. This is a large project and you cannot be expected to do it by yourself. In the case of platform-specific requirements (like pc98) where you won't have access to equipment I think it's fair to request platform maintainers help. > >> IMHO, if you can't commit fixed drivers along with the MPSAFE TTY >> layer, a more reasonable schedule is to replace the existing TTY layer >> with an MPSAFE TTY layer that includes compatibility shims. If the >> shims make things non-MPSAFE (which is likely) then warn that they >> will be going away in (say) six months. This gives developers a more >> reasonable timeframe in which to update, as well as working drivers >> whilst they adapt them. >> > > So let us take a look at the list again: > > >> sio(4), cy(4), digi(4), ubser(4), uftdi(4), nmdm(4), ng_h4(4), >> ng_tty(4), snp(4), rp(4), rc(4), si(4), umodem(4), dcons(4). >> > > Removing the drivers which have been fixed, or will be fixed in time: > > >> cy(4), digi(4), ubser(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), >> rc(4), si(4). >> > > After I've committed the new TTY layer, I'm going to extend its design, > so we can have hooks again, similar to the old line discipline idea. > This has already been discussed. I'm also planning on reviving drivers > like nmdm(4) and snp(4) by then. This means we've only got these > drivers left: > > >> cy(4), digi(4), rp(4), rc(4), si(4). >> > > Who actually owns one of these devices? If you do, please contact me. If > I didn't make myself clear enough: I *am* willing to (assist in > porting|port) these drivers. > > digi is perhaps most important in this list but I think you should expect other folks to help. Sam From rwatson at FreeBSD.org Fri Jul 4 00:30:19 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Fri Jul 4 00:30:24 2008 Subject: Remaining non-MPSAFE netisr handlers In-Reply-To: <20080526102345.G26343@fledge.watson.org> References: <20080526102345.G26343@fledge.watson.org> Message-ID: <20080704012901.U90881@fledge.watson.org> On Mon, 26 May 2008, Robert Watson wrote: > In the continuing campaign to eliminate the Giant lock from the dregs of the > network stack, I thought I'd send out a list of non-MPSAFE netisr handlers: > > Location Handler Removed with IFF_NEEDSGIANT > dev/usb/usb_ethersubr.c:120 usbintr Yes > net/if_ppp.c:277 pppintr Yes > netinet6/ip6_input.c ip6_input No > > The plan for 8.0 is to remove the NETISR_MPSAFE flag -- all netisr handlers > will be executed without the Giant lock. This doesn't prohibit acquiring > Giant in the handler if required, although that's undesirable for the > obvious reasons (potentially stalling interrupt handling, etc). Obviously, > what would be most desirable is eliminating the remaining requirement for > Giant in the IPv6 input path, primarily consisting of mld6 and nd6. > > With this in mind, my current plan is to remove the flag and add explicit > Giant acquisition for any remaining handlers in June when IFF_NEEDSGIANT > device drivers are disabled. I've now removed the NETISR_MPSAFE flag -- all netisr handlers are now assumed to DTRT with respect to locking. At least until usb and ppp are sorted out, I've introduced NETISR_FORCEQUEUE as an interim measure, which allows protocols to request that they always operate the deferred dispatch, meaning they can acquire Giant if they need to, and modified those two to do so. That should go away by 8.0 also. Robert N M Watson Computer Laboratory University of Cambridge From xcllnt at mac.com Fri Jul 4 01:49:49 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Fri Jul 4 01:50:01 2008 Subject: MPSAFE TTY schedule [uart vs sio] In-Reply-To: <486D4006.2050303@freebsd.org> References: <20080702190901.GS14567@hoeg.nl> <20080703193406.GS29380@server.vk2pj.dyndns.org> <20080703205220.GW14567@hoeg.nl> <486D4006.2050303@freebsd.org> Message-ID: <993E865A-A426-4036-9E09-A87D7474DE80@mac.com> On Jul 3, 2008, at 2:09 PM, Sam Leffler wrote: >> But I just got told sio(4) is required for pc98, because uart(4) is >> not >> supported there. This means I'll seriously consider porting sio(4) >> one >> of these days. It's no biggie, even though I think someone could >> better >> take the effort to extend uart(4). >> > > I would suggest first investigating how difficult it is to port uart > to pc98. Given that we're broadening our platform support having a > single serial driver seems preferable. I looked into it in 2003 but since I don't have any hardware, I wasn't the one able to do it. I think the fundamental problem is that the BRG is not part of the UART itself and needs a separate handle or even (tag, handle) pair to access. That's as far as I know the only big thing about the work. For me not having access to the hardware is a showstopper for looking into it myself. -- Marcel Moolenaar xcllnt@mac.com From peterjeremy at optushome.com.au Fri Jul 4 02:21:49 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Fri Jul 4 02:22:01 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080703205220.GW14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> <20080703193406.GS29380@server.vk2pj.dyndns.org> <20080703205220.GW14567@hoeg.nl> Message-ID: <20080704022125.GA32475@server.vk2pj.dyndns.org> Hi Ed, Thanks for your response. On 2008-Jul-03 22:52:20 +0200, Ed Schouten wrote: >Well, even though I'd rather let other people assess me, I think I've >been very proactive so far. As you can see, I sent my email to the list >two days ago. In those two days I've fixed both umodem(4) and uftdi(4) >to work with the new TTY layer again. And I don't expect you to do all the work yourself - you can't be expected to test hardware you don't have. But where is the documentation to let someone else adapt the drivers? >> cy(4), digi(4), rp(4), rc(4), si(4). > >Who actually owns one of these devices? If you do, please contact me. If >I didn't make myself clear enough: I *am* willing to (assist in >porting|port) these drivers. I have access to a Digi Xem boards at work and have poked around inside the digi(4) code in the past. My difficulty is that the cards are all in use and upgrading to a FreeBSD-current that doesn't support them and then porting the driver is probably not an option (whereas converting it from using shims to access the TTY layer to doing so directly would probably be acceptable - because I can get the board going again in a hurry if needed). I can't help with any of the others, sorry. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080704/1ff4659e/attachment.pgp From ed at 80386.nl Fri Jul 4 09:22:45 2008 From: ed at 80386.nl (Ed Schouten) Date: Fri Jul 4 09:22:58 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080704022125.GA32475@server.vk2pj.dyndns.org> References: <20080702190901.GS14567@hoeg.nl> <20080703193406.GS29380@server.vk2pj.dyndns.org> <20080703205220.GW14567@hoeg.nl> <20080704022125.GA32475@server.vk2pj.dyndns.org> Message-ID: <20080704092244.GY14567@hoeg.nl> Hello Peter, * Peter Jeremy wrote: > On 2008-Jul-03 22:52:20 +0200, Ed Schouten wrote: > >Well, even though I'd rather let other people assess me, I think I've > >been very proactive so far. As you can see, I sent my email to the list > >two days ago. In those two days I've fixed both umodem(4) and uftdi(4) > >to work with the new TTY layer again. > > And I don't expect you to do all the work yourself - you can't be > expected to test hardware you don't have. But where is the documentation > to let someone else adapt the drivers? Well, right now documentation on the KPI is one of the things that is missing. As you mentioned in your first email, I am planning to add several man9 pages, but so far I haven't found any time to write them. I think there are already various (very small) drivers that could be used as examples. Below is a list of files, including their size in lines: | 241 /sys/dev/ofw/ofw_console.c | 181 /sys/ia64/ia64/ssc.c | 371 /sys/sun4v/sun4v/hvcons.c > >> cy(4), digi(4), rp(4), rc(4), si(4). > > > >Who actually owns one of these devices? If you do, please contact me. If > >I didn't make myself clear enough: I *am* willing to (assist in > >porting|port) these drivers. > > I have access to a Digi Xem boards at work and have poked around > inside the digi(4) code in the past. My difficulty is that the cards > are all in use and upgrading to a FreeBSD-current that doesn't support > them and then porting the driver is probably not an option (whereas > converting it from using shims to access the TTY layer to doing so > directly would probably be acceptable - because I can get the board > going again in a hurry if needed). The problem with the old TTY layer, is that drivers tend to access the internals of the TTY structure very often. A good example of this is the clists, where TTY drivers tamper around inside the clist and cblock structures. There is not much room to implement a compatibility layer there. The digi(4) code shouldn't be very hard to port. As I said before, I am considering making most drivers at least compile before the code hits the tree, which should make it a lot easier for people to get their things working again. Yours, -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080704/fef65e07/attachment.pgp From peter at wemm.org Fri Jul 4 09:41:08 2008 From: peter at wemm.org (Peter Wemm) Date: Fri Jul 4 09:41:24 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080703205220.GW14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> <20080703193406.GS29380@server.vk2pj.dyndns.org> <20080703205220.GW14567@hoeg.nl> Message-ID: On Thu, Jul 3, 2008 at 1:52 PM, Ed Schouten wrote: [..] >> cy(4), digi(4), rp(4), rc(4), si(4). > > Who actually owns one of these devices? If you do, please contact me. If > I didn't make myself clear enough: I *am* willing to (assist in > porting|port) these drivers. I use si extensively. For example: si0: port 0xd800-0xd87f mem 0xefffd000-0xefffd07f,0xeffe0000-0xeffeffff irq 18 at device 8.0 on pci1 si0: [GIANT-LOCKED] si0: PLX register 0x50: 0x18044000 changed to 0x18260000 si0: card: SXPCI, ports: 8, modules: 1, type: 40 (XIO) I'll be happy to take a shot at it but I'll need a few hints or worked examples. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell From imp at bsdimp.com Fri Jul 4 12:32:40 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Fri Jul 4 12:32:46 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080702190901.GS14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> Message-ID: <20080704.063348.-1967689426.imp@bsdimp.com> In message: <20080702190901.GS14567@hoeg.nl> Ed Schouten writes: : - Not all drivers have been ported to the new TTY layer yet. These : drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), : uftdi(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4), : umodem(4), dcons(4). I have cy hardware available. I also have uftdi and umodem hardware. Warner From imp at bsdimp.com Fri Jul 4 12:36:15 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Fri Jul 4 12:36:33 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080703205220.GW14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> <20080703193406.GS29380@server.vk2pj.dyndns.org> <20080703205220.GW14567@hoeg.nl> Message-ID: <20080704.063655.-1977529554.imp@bsdimp.com> In message: <20080703205220.GW14567@hoeg.nl> Ed Schouten writes: : > cy(4), digi(4), rp(4), rc(4), si(4). : : Who actually owns one of these devices? If you do, please contact me. If : I didn't make myself clear enough: I *am* willing to (assist in : porting|port) these drivers. I can send you cy(4) hardware (both PCI and ISA) if you'd like. I have no time, however, to assist in the porting. Warner From imp at bsdimp.com Fri Jul 4 12:36:23 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Fri Jul 4 12:36:33 2008 Subject: MPSAFE TTY schedule [uart vs sio] In-Reply-To: <993E865A-A426-4036-9E09-A87D7474DE80@mac.com> References: <20080703205220.GW14567@hoeg.nl> <486D4006.2050303@freebsd.org> <993E865A-A426-4036-9E09-A87D7474DE80@mac.com> Message-ID: <20080704.063540.1210476607.imp@bsdimp.com> In message: <993E865A-A426-4036-9E09-A87D7474DE80@mac.com> Marcel Moolenaar writes: : : On Jul 3, 2008, at 2:09 PM, Sam Leffler wrote: : : >> But I just got told sio(4) is required for pc98, because uart(4) is : >> not : >> supported there. This means I'll seriously consider porting sio(4) : >> one : >> of these days. It's no biggie, even though I think someone could : >> better : >> take the effort to extend uart(4). : >> : > : > I would suggest first investigating how difficult it is to port uart : > to pc98. Given that we're broadening our platform support having a : > single serial driver seems preferable. : : I looked into it in 2003 but since I don't have any hardware, : I wasn't the one able to do it. I think the fundamental problem : is that the BRG is not part of the UART itself and needs a : separate handle or even (tag, handle) pair to access. That's as : far as I know the only big thing about the work. : : For me not having access to the hardware is a showstopper for : looking into it myself. Do you need physical access? I have a pc98 machine I can put back on the network. It has the 8251 chip in it. It also has a 16550 part as well since it is a later model which had both... I believe that uart works for the 16550 part, but haven't tried it lately... Warner From nyan at jp.FreeBSD.org Fri Jul 4 13:12:54 2008 From: nyan at jp.FreeBSD.org (Takahashi Yoshihiro) Date: Fri Jul 4 13:12:59 2008 Subject: MPSAFE TTY schedule [uart vs sio] In-Reply-To: <20080704.063540.1210476607.imp@bsdimp.com> References: <486D4006.2050303@freebsd.org> <993E865A-A426-4036-9E09-A87D7474DE80@mac.com> <20080704.063540.1210476607.imp@bsdimp.com> Message-ID: <20080704.221043.226715262.nyan@jp.FreeBSD.org> In article <20080704.063540.1210476607.imp@bsdimp.com> "M. Warner Losh" writes: > Do you need physical access? I have a pc98 machine I can put back on > the network. It has the 8251 chip in it. It also has a 16550 part as > well since it is a later model which had both... > > I believe that uart works for the 16550 part, but haven't tried it > lately... The uart probably works for some 16550 based devices but does not work for other one like multi-port devices. --- TAKAHASHI Yoshihiro From cokane at FreeBSD.org Fri Jul 4 14:57:39 2008 From: cokane at FreeBSD.org (Coleman Kane) Date: Fri Jul 4 14:57:46 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080704.063348.-1967689426.imp@bsdimp.com> References: <20080702190901.GS14567@hoeg.nl> <20080704.063348.-1967689426.imp@bsdimp.com> Message-ID: <1215182359.2446.27.camel@localhost> On Fri, 2008-07-04 at 06:33 -0600, M. Warner Losh wrote: > In message: <20080702190901.GS14567@hoeg.nl> > Ed Schouten writes: > : - Not all drivers have been ported to the new TTY layer yet. These > : drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), > : uftdi(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4), > : umodem(4), dcons(4). > > I have cy hardware available. I also have uftdi and umodem hardware. > > Warner FWIW, I think that many mobile phones of the Motorola variety, when used with the USB adapter, will use the umodem driver. I am not sure of other phones though, as I've only gone through about 4 generations of Motorola hardware since ~2003. -- Coleman Kane -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080704/3c4b67ea/attachment.pgp From xcllnt at mac.com Fri Jul 4 19:42:37 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Fri Jul 4 19:42:43 2008 Subject: MPSAFE TTY schedule [uart vs sio] In-Reply-To: <20080704.063540.1210476607.imp@bsdimp.com> References: <20080703205220.GW14567@hoeg.nl> <486D4006.2050303@freebsd.org> <993E865A-A426-4036-9E09-A87D7474DE80@mac.com> <20080704.063540.1210476607.imp@bsdimp.com> Message-ID: On Jul 4, 2008, at 5:35 AM, M. Warner Losh wrote: > : I looked into it in 2003 but since I don't have any hardware, > : I wasn't the one able to do it. I think the fundamental problem > : is that the BRG is not part of the UART itself and needs a > : separate handle or even (tag, handle) pair to access. That's as > : far as I know the only big thing about the work. > : > : For me not having access to the hardware is a showstopper for > : looking into it myself. > > Do you need physical access? No, not at all. As long as the USART is connected to another UART that I can access and as long as I can load/unload the uart(4) module, I should be able to write the support for it. Even only restricted sudo(1) should be fine. > I have a pc98 machine I can put back on > the network. It has the 8251 chip in it. It also has a 16550 part as > well since it is a later model which had both... Perfect. A null-modem cable between the two of them and I should be all set. -- Marcel Moolenaar xcllnt@mac.com From xcllnt at mac.com Fri Jul 4 19:50:34 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Fri Jul 4 19:50:40 2008 Subject: MPSAFE TTY schedule [uart vs sio] In-Reply-To: <20080704.221043.226715262.nyan@jp.FreeBSD.org> References: <486D4006.2050303@freebsd.org> <993E865A-A426-4036-9E09-A87D7474DE80@mac.com> <20080704.063540.1210476607.imp@bsdimp.com> <20080704.221043.226715262.nyan@jp.FreeBSD.org> Message-ID: <29489C48-93A2-41D9-9EF1-5395A673A9B3@mac.com> On Jul 4, 2008, at 6:10 AM, Takahashi Yoshihiro wrote: > In article <20080704.063540.1210476607.imp@bsdimp.com> > "M. Warner Losh" writes: > >> Do you need physical access? I have a pc98 machine I can put back on >> the network. It has the 8251 chip in it. It also has a 16550 part >> as >> well since it is a later model which had both... >> >> I believe that uart works for the 16550 part, but haven't tried it >> lately... > > The uart probably works for some 16550 based devices but does not work > for other one like multi-port devices. The design principle of uart(4) is that it does not know about multi-port hardware. It controls a single serial port only. For multi-port hardware you must have multiple nodes on a bus or use an umbrella driver, such as puc(4), quicc(4) or scc(4). Those drivers provide attachments for every port. I suspect that support for multi-port devices is not to hard to do on pc98... -- Marcel Moolenaar xcllnt@mac.com From nyan at jp.FreeBSD.org Sat Jul 5 12:25:42 2008 From: nyan at jp.FreeBSD.org (Takahashi Yoshihiro) Date: Sat Jul 5 12:25:54 2008 Subject: MPSAFE TTY schedule [uart vs sio] In-Reply-To: <29489C48-93A2-41D9-9EF1-5395A673A9B3@mac.com> References: <20080704.063540.1210476607.imp@bsdimp.com> <20080704.221043.226715262.nyan@jp.FreeBSD.org> <29489C48-93A2-41D9-9EF1-5395A673A9B3@mac.com> Message-ID: <20080705.212422.226755141.nyan@jp.FreeBSD.org> In article <29489C48-93A2-41D9-9EF1-5395A673A9B3@mac.com> Marcel Moolenaar writes: > > The uart probably works for some 16550 based devices but does not work > > for other one like multi-port devices. > > The design principle of uart(4) is that it does not know > about multi-port hardware. It controls a single serial > port only. For multi-port hardware you must have multiple > nodes on a bus or use an umbrella driver, such as puc(4), > quicc(4) or scc(4). Those drivers provide attachments for > every port. > > I suspect that support for multi-port devices is not to > hard to do on pc98... Many serial devices on pc98 use indirect I/O space, so resource management is quite complex. Therefore, it may need more work you think. At the starting point, I have added CBus frontend and fixed console support for pc98. http://home.jp.freebsd.org/~nyan/patches/uart_pc98.diff.bz2 --- TAKAHASHI Yoshihiro From xcllnt at mac.com Sat Jul 5 16:04:59 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Sat Jul 5 16:05:12 2008 Subject: MPSAFE TTY schedule [uart vs sio] In-Reply-To: <20080705.212422.226755141.nyan@jp.FreeBSD.org> References: <20080704.063540.1210476607.imp@bsdimp.com> <20080704.221043.226715262.nyan@jp.FreeBSD.org> <29489C48-93A2-41D9-9EF1-5395A673A9B3@mac.com> <20080705.212422.226755141.nyan@jp.FreeBSD.org> Message-ID: <254B5D19-E08A-43A0-AB76-43299C4AD77C@mac.com> On Jul 5, 2008, at 5:24 AM, Takahashi Yoshihiro wrote: > In article <29489C48-93A2-41D9-9EF1-5395A673A9B3@mac.com> > Marcel Moolenaar writes: > >>> The uart probably works for some 16550 based devices but does not >>> work >>> for other one like multi-port devices. >> >> The design principle of uart(4) is that it does not know >> about multi-port hardware. It controls a single serial >> port only. For multi-port hardware you must have multiple >> nodes on a bus or use an umbrella driver, such as puc(4), >> quicc(4) or scc(4). Those drivers provide attachments for >> every port. >> >> I suspect that support for multi-port devices is not to >> hard to do on pc98... > > Many serial devices on pc98 use indirect I/O space, so resource > management is quite complex. Therefore, it may need more work you > think. I'm not sure I understand exactly what that means. Can you elaborate? > At the starting point, I have added CBus frontend and fixed console > support for pc98. Great, thanks! Could you commit sys/pc98/include/bus.h and sys/pc98/pc98/busiosubr.c at your earliest convenience. That code has to be in the kernel if I were to work on the uart module. Thanks, -- Marcel Moolenaar xcllnt@mac.com From brde at optusnet.com.au Sun Jul 6 15:04:21 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Sun Jul 6 15:04:34 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080704092244.GY14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> <20080703193406.GS29380@server.vk2pj.dyndns.org> <20080703205220.GW14567@hoeg.nl> <20080704022125.GA32475@server.vk2pj.dyndns.org> <20080704092244.GY14567@hoeg.nl> Message-ID: <20080705095912.M12433@delplex.bde.org> On Fri, 4 Jul 2008, Ed Schouten wrote: >>>> cy(4), digi(4), rp(4), rc(4), si(4). >>> >>> Who actually owns one of these devices? If you do, please contact me. If >>> I didn't make myself clear enough: I *am* willing to (assist in >>> porting|port) these drivers. I have 24 ports on cy devices, but don't use them except for testing. >> I have access to a Digi Xem boards at work and have poked around >> inside the digi(4) code in the past. My difficulty is that the cards >> are all in use and upgrading to a FreeBSD-current that doesn't support >> them and then porting the driver is probably not an option (whereas >> converting it from using shims to access the TTY layer to doing so >> directly would probably be acceptable - because I can get the board >> going again in a hurry if needed). > > The problem with the old TTY layer, is that drivers tend to access the > internals of the TTY structure very often. A good example of this is the > clists, where TTY drivers tamper around inside the clist and cblock > structures. There is not much room to implement a compatibility layer > there. This is a very bad example. Clist accesses are, or should be, a non- problem. No (non-broken) tty drivers access the internals of clists directly, except for read-only accesses to the character count (c_cc), which isn't clist-specific. All of them use the old KPI functions (putc/getc/b_to_q/q_to_b, etc) for accessing the tty queues, and the implementation of the tty queues can be changed to anything without any KPI changes except possibly to the spelling of c_cc. Non-drivers like slip and ppp have slightly more clist-specific knowledge, but again their interface is limited mainly to the KPI (clist_alloc_cblocks, ...) and a read-only character count (cfreecount). if_sl.c still has has a lot of comments about its knowledge of clists, but these barely apply since its implementation only depends on an adequate buffering mechanism for characters (not sure if the characters need to be quotable). Driver-specific locking for clists is even less of a problem. No driver-specific locking is needed for the calls, since they are locked (using spl or Giant) in clist internals. The direct accesses to c_cc should be locked in the same way, but missing locking for these is almost harmless since the accesses are read-only and reading a stale value is usually harmless. Internal locking at each entry point of the KPI encourages unlocked accesses to c_cc, since c_cc becomes volatile immediately after releasing the internal lock. In practice, things like t_oproc() routines use higher-level locking so that there is no race and the internal locking in getc/q_to_b is bogus: xxxstart: /* * Lock whole loop. Without this, getc()'s access to c_cc would * give the same race as our direct access after getc() unlocks. * With this, getc()'s locking is just a waste of time (it's * recursive so that this isn't fatal, so the waste of time * hopefully isn't very large, but this requires recursive locking * to be used all over, so there are time and robustness costs all * over). * spltty(); /* Or Giant in bad drivers. */ /* * Non-Giant locking at a higher level than here might be OK, or * might not, depending on whether the driver wants very fine-grained * locking. If this is done, then we can remove all these fine- * grained spl lock calls in drivers instead of replacing them by * a tty lock. Meanwhile, the spls serve as placeholders to remind * us where to put the tty locks. */ while (tp->t_outq.c_cc != 0 && /* XXX efficiency hack. */ (ch = getc(&tp->t_outq)) != -1) move_ch_to_driver_buffer(...) splx(); A possibly better way to handle this is to accept losing races on the read-only variable but ensure that the driver is woken up without much delay if the variable changes. This already happens in most or all cases, since changing c_cc requires action to process the new state The number of KPI and c_cc accesses is also very small. In most drivers it consists of a whole 1 getc or q_to_b call in t_oproc() and a whole 1 c_cc read to avoid this call. A few drivers implement a bulk input routine that bypasses t_rint() in the TS_CAN_BYPASS_L_RINT case. This requires 1 b_to_q call and 1 c_cc read to implement flow control. This should be in the tty layer. It doesn't break the clist layering but it breaks the tty layering. slip and ppp make much heavier use of the KPI by count of the number of calls (about 5 putc's, 2 unputc's.... each). By churning the KPI, you create a lot of work. Bruce From babkin at verizon.net Sun Jul 6 22:30:21 2008 From: babkin at verizon.net (Sergey Babkin) Date: Sun Jul 6 22:30:27 2008 Subject: Proposal: a revoke() system call Message-ID: <48714866.906912CC@verizon.net> Hi all, I want to propose a system call with the following functionality: Syntax: int revoke(int fd, int flags) Revoke a file desriptor from this proces. For all practical purposes, it's equivalent to close(), except that the descriptor (fd) is not freed. Any further calls (except close()) on this fd would return an error. Close() would free the file descriptor as usual. If any calls were in progress sleeping (such as read() waiting for data), they would be interrupted and return an error. Flags could contain a bitmap that would modify the meaning of the call. I can think of at least one such modification: REVOKE_EOF, that if set, would make any further read() calls return 0 (EOF indication) instead of an error. Rationale: In the multithreaded programs often multiple threads work with the same file descriptor. A particularly typical situation is a reader thread and a writer thread. The reader thread calls read(), gets blocked until it gets more data, then processes the data and continues the loop. Another example of a "reader thread" would be the main thread of a daemon that accepts the incoming connections and starts new per-connection threads. If the application decides that it wants to close this file descriptor abruptly, getting the reader thread to wake up and exit is not easy. It's fraught with synchronisation issues. Things get even more complicated if there are multiple layers of library wrappers. The proposed system call makes it easy to pretend that the file descriptor has experienced an error (or that a socket connection has been closed by the other side). The library layers should be already able to handle errors, so the problem would be solved transparently for them. For sockets a similar functionality can already be achieved with shutdown(fd, SHUT_RDWR). But it works only for connected sockets, not for other file types nor sockets runnig accept(). A new system call would apply it to all the kinds of file descriptors. Another option is to extend the shutdown() call to the non-socket file descriptors. Any comments? Would anyone mind if I implement it? -SB From rwatson at FreeBSD.org Sun Jul 6 23:05:30 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Sun Jul 6 23:05:36 2008 Subject: Proposal: a revoke() system call In-Reply-To: <48714866.906912CC@verizon.net> References: <48714866.906912CC@verizon.net> Message-ID: <20080707000313.P56885@fledge.watson.org> On Sun, 6 Jul 2008, Sergey Babkin wrote: > int revoke(int fd, int flags) Seems like that conflicts with our existing revoke(2) system call. You could achieve something of the same end by opening /dev/null and then dup2()'ing to the file descriptor you want to revoke, perhaps? Right now there's a known issue that calling close(2) on a socket from one thread doesn't interrupt a socket in a blocking I/O call from another thread -- you first have to call shutdown(2), and then close(2). This has caused problems for Java in the past, but I'm not sure that it's really a bug given that it's not unreasonable behavior not rejected by the spec :-). Robert N M Watson Computer Laboratory University of Cambridge > > Revoke a file desriptor from this proces. For all practical > purposes, it's equivalent to close(), except that the descriptor > (fd) is not freed. Any further calls (except close()) on this fd > would return an error. Close() would free the file descriptor > as usual. If any calls were in progress sleeping (such as read() > waiting for data), they would be interrupted and return an error. > > Flags could contain a bitmap that would modify the meaning of the > call. I can think of at least one such modification: REVOKE_EOF, > that if set, would make any further read() calls return 0 (EOF > indication) instead of an error. > > Rationale: > > In the multithreaded programs often multiple threads work with the > same file descriptor. A particularly typical situation is a reader > thread and a writer thread. The reader thread calls read(), gets > blocked until it gets more data, then processes the data and > continues the loop. Another example of a "reader thread" would be > the main thread of a daemon that accepts the incoming connections > and starts new per-connection threads. > > If the application decides that it wants to close this file > descriptor abruptly, getting the reader thread to wake up and exit > is not easy. It's fraught with synchronisation issues. > Things get even more complicated if there are multiple layers > of library wrappers. > > The proposed system call makes it easy to pretend that the file > descriptor has experienced an error (or that a socket connection > has been closed by the other side). The library layers should be > already able to handle errors, so the problem would be solved > transparently for them. For sockets a similar > functionality can already be achieved with shutdown(fd, SHUT_RDWR). > But it works only for connected sockets, not for other file types > nor sockets runnig accept(). A new system call would apply it > to all the kinds of file descriptors. Another option is > to extend the shutdown() call to the non-socket file descriptors. > > Any comments? Would anyone mind if I implement it? > > -SB > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From phk at phk.freebsd.dk Mon Jul 7 07:51:15 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon Jul 7 07:51:22 2008 Subject: Proposal: a revoke() system call In-Reply-To: Your message of "Sun, 06 Jul 2008 18:34:14 -0400." <48714866.906912CC@verizon.net> Message-ID: <4219.1215417072@critter.freebsd.dk> In message <48714866.906912CC@verizon.net>, Sergey Babkin writes: >Hi all, > >I want to propose a system call with the following functionality: > >Syntax: > > int revoke(int fd, int flags) We already have a revoke(2) system call, so the name will have to be something different. >Rationale: > >In the multithreaded programs often multiple threads work with the >same file descriptor. A particularly typical situation is a reader >thread and a writer thread. The reader thread calls read(), gets >blocked until it gets more data, then processes the data and >continues the loop. Another example of a "reader thread" would be >the main thread of a daemon that accepts the incoming connections >and starts new per-connection threads. Have you tried to implement the functionality you're asking for ? You'll have to hunt down into all sorts of protocols, drivers and other code to find the threads sleeping on your fd so you can wake them. It may be quite a piece of work. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From peterjeremy at optushome.com.au Mon Jul 7 08:43:34 2008 From: peterjeremy at optushome.com.au (Peter Jeremy) Date: Mon Jul 7 08:43:41 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080702190901.GS14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> Message-ID: <20080703193406.GS29380@server.vk2pj.dyndns.org> On 2008-Jul-02 21:09:01 +0200, Ed Schouten wrote: >+ The new pseudo-terminal driver is capable of garbage collecting unused > PTY's. Because PTY's are never recycled, they are a lot more robust > (they are always initialized the same, no need to revoke() them before > usage, etc). When you say 'never recycled', does this include the PTY number? If so, long running busy systems are going to get some fairly large numbers. When will the PTY number wrap? What is the impact on tools (eg ps, w) that assume they can represent a PTY in a small number of digits? What about utmp(5) which uses PTY number in the index? >- Not all drivers have been ported to the new TTY layer yet. These > drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), > uftdi(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4), > umodem(4), dcons(4). > >Even though drivers are very important to have, I am convinced we can >get these working not long after the code as been integrated. ... > If you really care about one of these drivers, >please port it to the new TTY layer as soon as possible! IMHO, this is not a reasonable approach: "Hi everyone. I'm going to break infrastructure that a whole bunch of drivers depend on. If you don't fix your drivers in the next few weeks then I'll disconnect them". Either you need to provide compatibility shims (possibly temporary and not MPSAFE) or you need to be far more pro-active in assisting with porting existing consumers of the TTY layer. >TTY layer into our kernel. I would really appreciate it if I could get >this code in before the end of the summer break, because I've got heaps >of spare time to fix any problems then. That's all very nice but what about the maintainers of all the other drivers that you are impacting? > sio(4) has not been ported to the new TTY layer and is very hard > to do so. This is the only mention of how much effort is involved in porting a driver to use the MPSAFE TTY layer and "very hard" is not a good start. I can't quickly find any documentation on how to go about porting an existing driver - definitely there are no section 9 man pages describing the new API in your patchset. IMHO, if you can't commit fixed drivers along with the MPSAFE TTY layer, a more reasonable schedule is to replace the existing TTY layer with an MPSAFE TTY layer that includes compatibility shims. If the shims make things non-MPSAFE (which is likely) then warn that they will be going away in (say) six months. This gives developers a more reasonable timeframe in which to update, as well as working drivers whilst they adapt them. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080707/076d632b/attachment.pgp From bugmaster at FreeBSD.org Mon Jul 7 11:06:56 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 7 11:07:34 2008 Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org Message-ID: <200807071106.m67B6tZE061933@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From babkin at verizon.net Mon Jul 7 15:05:39 2008 From: babkin at verizon.net (Sergey Babkin) Date: Mon Jul 7 15:05:46 2008 Subject: Proposal: a revoke() system call Message-ID: <7100389.65001215443113960.JavaMail.root@vms074.mailsrvcs.net> >On Sun, 6 Jul 2008, Sergey Babkin wrote: > >> int revoke(int fd, int flags) > >Seems like that conflicts with our existing revoke(2) system call. You could Aha, I guess when I've checked, I've looked at a real old version of FreeBSD. Sure, the name can be changed. >achieve something of the same end by opening /dev/null and then dup2()'ing to >the file descriptor you want to revoke, perhaps? Right now there's a known That's a great idea. I haven't thought about it. It should do everything. >issue that calling close(2) on a socket from one thread doesn't interrupt a >socket in a blocking I/O call from another thread -- you first have to call >shutdown(2), and then close(2). This has caused problems for Java in the >past, but I'm not sure that it's really a bug given that it's not unreasonable >behavior not rejected by the spec :-). Maybe I'll see if I can fix that. -SB From babkin at verizon.net Mon Jul 7 15:12:43 2008 From: babkin at verizon.net (Sergey Babkin) Date: Mon Jul 7 15:12:49 2008 Subject: Proposal: a revoke() system call Message-ID: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> >>Rationale: >> >>In the multithreaded programs often multiple threads work with the >>same file descriptor. A particularly typical situation is a reader >>thread and a writer thread. The reader thread calls read(), gets >>blocked until it gets more data, then processes the data and >>continues the loop. Another example of a "reader thread" would be >>the main thread of a daemon that accepts the incoming connections >>and starts new per-connection threads. > >Have you tried to implement the functionality you're asking for ? > >You'll have to hunt down into all sorts of protocols, drivers >and other code to find the threads sleeping on your fd so you can >wake them. My thinking has been that if close() wakes them up, then things would be inherited from there. The thing I didn't know is that apparently in many cases close() doesn't wake them up. -SB From rwatson at FreeBSD.org Mon Jul 7 15:30:07 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jul 7 15:30:14 2008 Subject: Proposal: a revoke() system call In-Reply-To: <7100389.65001215443113960.JavaMail.root@vms074.mailsrvcs.net> References: <7100389.65001215443113960.JavaMail.root@vms074.mailsrvcs.net> Message-ID: <20080707162733.V63144@fledge.watson.org> On Mon, 7 Jul 2008, Sergey Babkin wrote: >> On Sun, 6 Jul 2008, Sergey Babkin wrote: >> >>> int revoke(int fd, int flags) >> >> Seems like that conflicts with our existing revoke(2) system call. You >> could > > Aha, I guess when I've checked, I've looked at a real old version of > FreeBSD. Sure, the name can be changed. I won't point you at the HISTORY section of the revoke(2) system call then :-). >> achieve something of the same end by opening /dev/null and then dup2()'ing >> to the file descriptor you want to revoke, perhaps? Right now there's a >> known > > That's a great idea. I haven't thought about it. It should do everything. Right, and possibly this means that no additional kernel support is required -- we just make it a libc or libutil interface. >> issue that calling close(2) on a socket from one thread doesn't interrupt a >> socket in a blocking I/O call from another thread -- you first have to call >> shutdown(2), and then close(2). This has caused problems for Java in the >> past, but I'm not sure that it's really a bug given that it's not >> unreasonable behavior not rejected by the spec :-). > > Maybe I'll see if I can fix that. Well, fixing this is easy -- instead of holding a reference to the file descriptor over the system call, hold a reference to the socket. The problem with that is that it creates a lot more contention on the socket locks when the reference count is dropped, not to mention more locking operations. This can be fixed but requires quite a lot of work, whereas this rather minor semantic issue is a non-problem in practice. I do have dealing with this reference issue on my todo list, but it's very low on the list because there are lots of other areas where we can significantly improve performance or semantics more easily and more quickly :-). Robert N M Watson Computer Laboratory University of Cambridge From cokane at FreeBSD.org Mon Jul 7 15:39:50 2008 From: cokane at FreeBSD.org (Coleman Kane) Date: Mon Jul 7 15:39:56 2008 Subject: Proposal: a revoke() system call In-Reply-To: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> References: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> Message-ID: <1215445021.2033.13.camel@localhost> On Mon, 2008-07-07 at 10:12 -0500, Sergey Babkin wrote: > >>Rationale: > >> > >>In the multithreaded programs often multiple threads work with the > >>same file descriptor. A particularly typical situation is a reader > >>thread and a writer thread. The reader thread calls read(), gets > >>blocked until it gets more data, then processes the data and > >>continues the loop. Another example of a "reader thread" would be > >>the main thread of a daemon that accepts the incoming connections > >>and starts new per-connection threads. > > > >Have you tried to implement the functionality you're asking for ? > > > >You'll have to hunt down into all sorts of protocols, drivers > >and other code to find the threads sleeping on your fd so you can > >wake them. > > My thinking has been that if close() wakes them up, then things would be > inherited from there. The thing I didn't know is that apparently in many cases close() > doesn't wake them up. > > -SB > In cases where I need to wake the select() up immediately for cases such as this, I've implemented a "trigger pipe" that I include on the select list. This is a simple pipe, that is written to by the application in the cases where I want to design multi-threaded blocking I/O mechanisms such as this to wake up immediately. The "master thread" writes to the pipe, and the blocking thread gets notified of the readability of that pipe's fd through my blocking select() call. Then, it attempts to read (non-blocking-read) from the fd that I had closed, and gets an error returned. The reader thread then knows to exit the read-loop and return to the caller (which hopefully cleans it up later with pthread_join). That method seems to work really well in a relatively cross-platform manner. I typically shy away from these types of designs, however. I attempt to ensure that my "reader threads" use select() calls with a reasonable timeout (100ms or even 250ms is usually decent for non-realtime software), and have an external trigger variable such as a bool or similar (named stop_threads) that is part of the struct pointer that I have passed in the void* argument to the thread function when calling pthread_create(). Basically, my master thread would have a cleanup routine that runs at shut-down and sets the variable to true. It then proceeds to pthread_join() all of the threads that trigger their exits on that variable. Only once all threads are joined do I close() the file descriptor. The sequence of events can easily be applied to non-shutdown events where such behavior is desired, however. The key point here is the use of select() to determine when a descriptor is readable, and then using non-blocking I/O to perform the actual read/write calls. -- Coleman Kane -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080707/647ee4a3/attachment.pgp From phk at phk.freebsd.dk Mon Jul 7 16:04:12 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon Jul 7 16:04:19 2008 Subject: Proposal: a revoke() system call In-Reply-To: Your message of "Mon, 07 Jul 2008 10:12:29 EST." <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> Message-ID: <6860.1215446650@critter.freebsd.dk> In message <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net>, Serg ey Babkin writes: >My thinking has been that if close() wakes them up, then things would be >inherited from there. The thing I didn't know is that apparently in many cases close() >doesn't wake them up. It's a novel idea, seen with POSIX eyes, that a thread can close a fd it is sleeping on, so the semantics, how obvious they might be, is not described in the standards, and more importantly, not described in the code either. The device driver problem has more angles to it and should be thought out separately, since the same basic functionality is required for hardware removal, only more draconian. I'm not saying that such a systemcall is not a good idea, I'm merely very cautious about what it takes to implement it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From phk at phk.freebsd.dk Mon Jul 7 16:05:56 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon Jul 7 16:06:02 2008 Subject: Proposal: a revoke() system call In-Reply-To: Your message of "Mon, 07 Jul 2008 16:30:06 +0100." <20080707162733.V63144@fledge.watson.org> Message-ID: <6882.1215446754@critter.freebsd.dk> In message <20080707162733.V63144@fledge.watson.org>, Robert Watson writes: >>> achieve something of the same end by opening /dev/null and then dup2()'ing >>> to the file descriptor you want to revoke, perhaps? Right now there's a >>> known >> >> That's a great idea. I haven't thought about it. It should do everything. > >Right, and possibly this means that no additional kernel support is required >-- we just make it a libc or libutil interface. I can't see how that could possibly work... If you do a dup2(), the original fd is closed, and that still does not release all threads that may be sleeing on it in device drivers. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From rwatson at FreeBSD.org Mon Jul 7 16:33:23 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jul 7 16:33:29 2008 Subject: Proposal: a revoke() system call In-Reply-To: <6882.1215446754@critter.freebsd.dk> References: <6882.1215446754@critter.freebsd.dk> Message-ID: <20080707173102.L63144@fledge.watson.org> On Mon, 7 Jul 2008, Poul-Henning Kamp wrote: > In message <20080707162733.V63144@fledge.watson.org>, Robert Watson writes: > >>>> achieve something of the same end by opening /dev/null and then >>>> dup2()'ing to the file descriptor you want to revoke, perhaps? Right now >>>> there's a known >>> >>> That's a great idea. I haven't thought about it. It should do everything. >> >> Right, and possibly this means that no additional kernel support is >> required -- we just make it a libc or libutil interface. > > I can't see how that could possibly work... > > If you do a dup2(), the original fd is closed, and that still does not > release all threads that may be sleeing on it in device drivers. I see interrupting current consumers as a separable issue from invalidating the file descriptor for future users. I'm not convinced there's a good general solution for interrupting current consumers of a file descriptor -- we can improve the semantics for a few objects (i.e., sockets) if required, but I'm not sure it generalizes well. For sockets, generally speaking, calling shutdown(2) is the approved way to initiate a disconnect, which will lead to other consumers being kicked out of operations on the file descriptor, rather than close(2), which in general doesn't initiate a disconnect because it's a reference count operation on the underlying object. Robert N M Watson Computer Laboratory University of Cambridge From babkin at verizon.net Mon Jul 7 17:28:26 2008 From: babkin at verizon.net (Sergey Babkin) Date: Mon Jul 7 17:28:32 2008 Subject: Proposal: a revoke() system call Message-ID: <22302744.211651215451685258.JavaMail.root@vms227.mailsrvcs.net> >> My thinking has been that if close() wakes them up, then things would be >> inherited from there. The thing I didn't know is that apparently in many cases close() >> doesn't wake them up. >> >> -SB >> > >In cases where I need to wake the select() up immediately for cases such >as this, I've implemented a "trigger pipe" that I include on the select >list. This is a simple pipe, that is written to by the application in Yep, This is the design I'm trying to avoid :-) -SB From babkin at verizon.net Mon Jul 7 17:33:06 2008 From: babkin at verizon.net (Sergey Babkin) Date: Mon Jul 7 17:33:13 2008 Subject: Proposal: a revoke() system call Message-ID: <22395548.214801215451968131.JavaMail.root@vms227.mailsrvcs.net> >>> issue that calling close(2) on a socket from one thread doesn't interrupt a >>> socket in a blocking I/O call from another thread -- you first have to call >>> shutdown(2), and then close(2). This has caused problems for Java in the >>> past, but I'm not sure that it's really a bug given that it's not >>> unreasonable behavior not rejected by the spec :-). >> >> Maybe I'll see if I can fix that. > >Well, fixing this is easy -- instead of holding a reference to the file >descriptor over the system call, hold a reference to the socket. The problem >with that is that it creates a lot more contention on the socket locks when >the reference count is dropped, not to mention more locking operations. This >can be fixed but requires quite a lot of work, whereas this rather minor >semantic issue is a non-problem in practice. I do have dealing with this I can't comment much without actually looking at the code, but why would the contention on close() be such an issue? Close() is not called that often, compared for example to read(), so there should not be much contention to start with. And why not just call the shutdown() logic from inside close() implementation? -SB From babkin at verizon.net Mon Jul 7 18:49:37 2008 From: babkin at verizon.net (Sergey Babkin) Date: Mon Jul 7 18:49:44 2008 Subject: Proposal: a revoke() system call Message-ID: <9820978.224231215452958017.JavaMail.root@vms227.mailsrvcs.net> >From: Poul-Henning Kamp >In message <20080707162733.V63144@fledge.watson.org>, Robert Watson writes: > >>>> achieve something of the same end by opening /dev/null and then dup2()'ing >>>> to the file descriptor you want to revoke, perhaps? Right now there's a >>>> known >>> >>> That's a great idea. I haven't thought about it. It should do everything. >> >>Right, and possibly this means that no additional kernel support is required >>-- we just make it a libc or libutil interface. > >I can't see how that could possibly work... > >If you do a dup2(), the original fd is closed, and that still does not >release all threads that may be sleeing on it in device drivers. Device drivers definitely would be a pain. I guess it depends on the semantics of the driver close() routine in cdevsw. Even if it's called every time a process does close() - well, assuming that a process didn't share it through fork(), and if it did then the last process'es close() - then the driver might still not be handling correctly the wake-up of all the threads coming from this file descriptor. But I guess it's again connected to whether they can handle close() from a multithreaded application, with other threads still trying to read. Maybe make a new entry in devsw, requesting the driver to wake up all the sleepers (either coming from a particular file descriptor, or all the sleepers at all) and return EINTR for them. Then if we replace the entry in the file table first, the interrupted threads would handle the signal as usual, come back and find that when they try to restart the call on this descriptor, they get a hard error. Hm, maybe even a new devsw entry is not needed. Just pretend delivering a signal to all the threads, skipping the ones that aren't currently sleeping on a file I/O (i.e. running on sleeping on a synchronization primitive). Then just don't call any signal handler for this pretend-signal, return a EINTR and let it be handled in an usual way. -SB From das at FreeBSD.ORG Mon Jul 7 18:52:34 2008 From: das at FreeBSD.ORG (David Schultz) Date: Mon Jul 7 18:52:40 2008 Subject: Proposal: a revoke() system call In-Reply-To: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> References: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> Message-ID: <20080707182302.GA34751@zim.MIT.EDU> On Mon, Jul 07, 2008, Sergey Babkin wrote: > >>Rationale: > >> > >>In the multithreaded programs often multiple threads work with the > >>same file descriptor. A particularly typical situation is a reader > >>thread and a writer thread. The reader thread calls read(), gets > >>blocked until it gets more data, then processes the data and > >>continues the loop. Another example of a "reader thread" would be > >>the main thread of a daemon that accepts the incoming connections > >>and starts new per-connection threads. > > > >Have you tried to implement the functionality you're asking for ? > > > >You'll have to hunt down into all sorts of protocols, drivers > >and other code to find the threads sleeping on your fd so you can > >wake them. > > My thinking has been that if close() wakes them up, then things would be > inherited from there. The thing I didn't know is that apparently in many cases close() > doesn't wake them up. In Solaris, if you close a file descriptor that has blocked readers, the readers wake up and read() returns 0 bytes (EOF). (At least this is true if you close the local end of a pipe.) It seems like implementing the same behavior in FreeBSD would address your problem without introducing a new system call. Is there a good reason why this might not be the right thing to do? From kensmith at cse.Buffalo.EDU Mon Jul 7 19:32:53 2008 From: kensmith at cse.Buffalo.EDU (Ken Smith) Date: Mon Jul 7 19:33:00 2008 Subject: Proposal: a revoke() system call In-Reply-To: <20080707000313.P56885@fledge.watson.org> References: <48714866.906912CC@verizon.net> <20080707000313.P56885@fledge.watson.org> Message-ID: <1215457201.89956.11.camel@neo.cse.buffalo.edu> On Mon, 2008-07-07 at 00:05 +0100, Robert Watson wrote: > You could > achieve something of the same end by opening /dev/null and then dup2()'ing to > the file descriptor you want to revoke, perhaps? I might be missing something but isn't this what the deadfs vnodeops are for? -- Ken Smith - From there to here, from here to | kensmith@cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel | -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: This is a digitally signed message part Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080707/1ac3aeeb/attachment.pgp From phk at phk.freebsd.dk Mon Jul 7 19:56:15 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Mon Jul 7 19:56:21 2008 Subject: Proposal: a revoke() system call In-Reply-To: Your message of "Mon, 07 Jul 2008 12:49:17 EST." <9820978.224231215452958017.JavaMail.root@vms227.mailsrvcs.net> Message-ID: <59733.1215460572@critter.freebsd.dk> In message <9820978.224231215452958017.JavaMail.root@vms227.mailsrvcs.net>, Ser gey Babkin writes: >Maybe make a new entry in devsw, requesting the driver to wake up >all the sleepers (either coming from a particular file descriptor, or all >the sleepers at all) and return EINTR for them. We have that, it's called ->d_purge(), but the semantics are "driver or hardware going away", not "get rid of this thread/fd". As I said earlier, this requires careful thought. If the only reason we are dicussing this, is that people find the magic pipe to select ugly, then I would even argue that we have not reached critical mass for even thinking. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From brde at optusnet.com.au Mon Jul 7 20:03:23 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Mon Jul 7 20:03:29 2008 Subject: Proposal: a revoke() system call In-Reply-To: <20080707182302.GA34751@zim.MIT.EDU> References: <1878557.67061215443549669.JavaMail.root@vms074.mailsrvcs.net> <20080707182302.GA34751@zim.MIT.EDU> Message-ID: <20080708051956.L1122@besplex.bde.org> On Mon, 7 Jul 2008, David Schultz wrote: > On Mon, Jul 07, 2008, Sergey Babkin wrote: >>>> Rationale: >>>> >>>> In the multithreaded programs often multiple threads work with the >>>> same file descriptor. A particularly typical situation is a reader >>>> thread and a writer thread. The reader thread calls read(), gets >>>> blocked until it gets more data, then processes the data and >>>> continues the loop. Another example of a "reader thread" would be >>>> the main thread of a daemon that accepts the incoming connections >>>> and starts new per-connection threads. >>> >>> Have you tried to implement the functionality you're asking for ? >>> >>> You'll have to hunt down into all sorts of protocols, drivers >>> and other code to find the threads sleeping on your fd so you can >>> wake them. >> >> My thinking has been that if close() wakes them up, then things would be >> inherited from there. The thing I didn't know is that apparently in many cases close() >> doesn't wake them up. > > In Solaris, if you close a file descriptor that has blocked > readers, the readers wake up and read() returns 0 bytes (EOF). > (At least this is true if you close the local end of a pipe.) > It seems like implementing the same behavior in FreeBSD would > address your problem without introducing a new system call. > Is there a good reason why this might not be the right thing to do? Does this happen even for non-last closes of all file types? Pipes are too simple :-). Under FreeBSD, ordinary revoke(2) needs to do wake up all readers and synchronize with them (preferably without waiting for them), but it has never done this. The kernel has no mechanism for finding threads sleeping or doing i/o on an fd short of what fstat does (searching half of kmem for hints). Only a small amount of progress has been made in fixing this in the 20 years that revoke() has existed. Most of the necessary wakeups don't occur. A few occur accidentally. So it is normal for threads to be left active after revoke() completes, and the progress is mainly that the devfs and conf layers try harder to prevent deallocation of active data structures for devices in this state. The active threads may do some damage when they wake up with a closed or a new generation of open device, but usually don't. Tty drivers use a generation count to prevent some uses of new generations of opens, but don't check it in enough places. I haven't noticed any other class of drivers doing even this much. Since revoke() is used mainly on tty devices and the generation count almost works for these, these bugs are rarely noticed. Bruce From babkin at verizon.net Mon Jul 7 21:51:01 2008 From: babkin at verizon.net (Sergey Babkin) Date: Mon Jul 7 21:51:08 2008 Subject: Proposal: a revoke() system call Message-ID: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> >From: David Schultz >On Mon, Jul 07, 2008, Sergey Babkin wrote: >> >>Rationale: >> >> >> >>In the multithreaded programs often multiple threads work with the >> >>same file descriptor. A particularly typical situation is a reader >> >>thread and a writer thread. The reader thread calls read(), gets >> >>blocked until it gets more data, then processes the data and >> >>continues the loop. Another example of a "reader thread" would be >> >>the main thread of a daemon that accepts the incoming connections >> >>and starts new per-connection threads. >> > >> >Have you tried to implement the functionality you're asking for ? >> > >> >You'll have to hunt down into all sorts of protocols, drivers >> >and other code to find the threads sleeping on your fd so you can >> >wake them. >> >> My thinking has been that if close() wakes them up, then things would be >> inherited from there. The thing I didn't know is that apparently in many cases close() >> doesn't wake them up. > >In Solaris, if you close a file descriptor that has blocked >readers, the readers wake up and read() returns 0 bytes (EOF). >(At least this is true if you close the local end of a pipe.) >It seems like implementing the same behavior in FreeBSD would >address your problem without introducing a new system call. >Is there a good reason why this might not be the right thing to do? No, actually I didn't realize that FreeBSD has this issue at all :-) My experience comes from Linux and Solaris implementations. The issue is that close() introduces a race between setting the fd number in the aplication data and closing the socket. The reader works like this pseudocode: int fd; fd = mystructure.fd; if (fd < 0) return -1; return read(fd, ...); This leaves a small race window between fd is checked and read() is executed. If in the meantime another thread does close() (and sets mystructure.fd to -1), and the third thread does open() then the result of this open would use the same fd number as our old fd (since now it's likely to be the lowest available number), then read() would happen on a completely wrong file. And yes, it does happen in real world. The best workaround I've come up with is a small pause between setting mystructure.fd = -1 and calling close(). The point of proposal is to do a close() without freeing the file descriptor. -SB From babkin at verizon.net Mon Jul 7 21:57:01 2008 From: babkin at verizon.net (Sergey Babkin) Date: Mon Jul 7 21:57:08 2008 Subject: Proposal: a revoke() system call Message-ID: <29793635.342951215467793639.JavaMail.root@vms126.mailsrvcs.net> >From: Poul-Henning Kamp >If the only reason we are dicussing this, is that people find the >magic pipe to select ugly, then I would even argue that we have not >reached critical mass for even thinking. Well, there are a couple of problems with magical pipes: 1. It means using 3 times as many file descriptors. (One for the original socket, and 2 for the ends of the pipe). 2. When working with the 3rd-party libraries, it requires a substantial rework of these libraries. Getting a file decriptor from inside the library's implementation and forcing it to close is a lot less invasive and can be done with a simple API wrapper. -SB From rwatson at FreeBSD.org Mon Jul 7 23:06:42 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jul 7 23:06:49 2008 Subject: Proposal: a revoke() system call In-Reply-To: <1215457201.89956.11.camel@neo.cse.buffalo.edu> References: <48714866.906912CC@verizon.net> <20080707000313.P56885@fledge.watson.org> <1215457201.89956.11.camel@neo.cse.buffalo.edu> Message-ID: <20080708000132.K63144@fledge.watson.org> On Mon, 7 Jul 2008, Ken Smith wrote: > On Mon, 2008-07-07 at 00:05 +0100, Robert Watson wrote: >> You could achieve something of the same end by opening /dev/null and then >> dup2()'ing to the file descriptor you want to revoke, perhaps? > > I might be missing something but isn't this what the deadfs vnodeops are > for? It's a little different, although similar. When a vnode is deadfs'd, such as after a call to revoke(2)'s historic implementation, all open file descriptors on the file are invalidated. I think that Sergey is suggesting semantics in which only the current file descriptor refering to the object is invalidated -- other independently acquired file descriptors in other processes would remain valid. BTW, this does show up one of the potential semantic conflicts in the proposed new revoke behavior: suppose a TCP connection is opened, and two processes have references to the file descriptor for the connection. One of those processes is multi-threaded, and has a blocking read(2) on the file descriptor in one thread, and calls close(2) from another thread. Is the proposal to cancel in-progress I/O's against the file descriptor even though the connection isn't closing due to the further reference to the same descriptor in another process? Solaris has a pretty complex infrastructure to support that sort of in-kernel cancellation -- the shutdown(2) behavior we have is fairly different in that it manipulates connection state to cancel outstanding I/O's, and would also affect the second process, rather than simply consumers on the one file descriptor. Robert N M Watson Computer Laboratory University of Cambridge From rwatson at FreeBSD.org Mon Jul 7 23:15:26 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jul 7 23:15:33 2008 Subject: Proposal: a revoke() system call In-Reply-To: <22395548.214801215451968131.JavaMail.root@vms227.mailsrvcs.net> References: <22395548.214801215451968131.JavaMail.root@vms227.mailsrvcs.net> Message-ID: <20080708000701.R63144@fledge.watson.org> On Mon, 7 Jul 2008, Sergey Babkin wrote: >> Well, fixing this is easy -- instead of holding a reference to the file >> descriptor over the system call, hold a reference to the socket. The >> problem with that is that it creates a lot more contention on the socket >> locks when the reference count is dropped, not to mention more locking >> operations. This can be fixed but requires quite a lot of work, whereas >> this rather minor semantic issue is a non-problem in practice. I do have >> dealing with this > > I can't comment much without actually looking at the code, but why would the > contention on close() be such an issue? Close() is not called that often, > compared for example to read(), so there should not be much contention to > start with. And why not just call the shutdown() logic from inside close() > implementation? This is a fairly complex issue, and one that doesn't lend itself to in-depth discussion without first looking at the code. To direct your reading, I recommend starting with the socket reference model -- you can find a high-level summary in the comments at the head of uipc_socket.c, and the comments on sofree(9). The question you're getting at indirectly has to do with the differences between fdrop(9), which drops a reference to a file descriptor, and fputsock(9), which drops a reference to a socket. You'll also find it useful to do a bit of reading regarding the difference between close(2), which releases a reference to a file descriptor from userspace, and fo_close(9), which is invoked in-kernel when the last reference to a file descriptor goes away. Robert N M Watson Computer Laboratory University of Cambridge From rwatson at FreeBSD.org Mon Jul 7 23:23:02 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Mon Jul 7 23:23:08 2008 Subject: Proposal: a revoke() system call In-Reply-To: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> Message-ID: <20080708001929.E63144@fledge.watson.org> On Mon, 7 Jul 2008, Sergey Babkin wrote: > This leaves a small race window between fd is checked and read() is > executed. If in the meantime another thread does close() (and sets > mystructure.fd to -1), and the third thread does open() then the result of > this open would use the same fd number as our old fd (since now it's likely > to be the lowest available number), then read() would happen on a completely > wrong file. And yes, it does happen in real world. The best workaround I've > come up with is a small pause between setting mystructure.fd = -1 and > calling close(). > > The point of proposal is to do a close() without freeing the file > descriptor. Which can be accomplished by calling dup2(2) to replace the file descriptor with another file descriptor, perhaps one to /dev/null. It would be worth carefully reviewing the implementation of dup2(2) to make sure that the close->replace there is atomic with respect to other threads simultaneously allocating file descriptors, such as with pipe(2). This won't cancel existing I/Os, but per discussion, I/O cancelation is a very complicated issue. Robert N M Watson Computer Laboratory University of Cambridge From phk at phk.freebsd.dk Tue Jul 8 06:09:48 2008 From: phk at phk.freebsd.dk (Poul-Henning Kamp) Date: Tue Jul 8 06:09:55 2008 Subject: Proposal: a revoke() system call In-Reply-To: Your message of "Mon, 07 Jul 2008 16:56:33 EST." <29793635.342951215467793639.JavaMail.root@vms126.mailsrvcs.net> Message-ID: <701.1215497386@critter.freebsd.dk> In message <29793635.342951215467793639.JavaMail.root@vms126.mailsrvcs.net>, Se rgey Babkin writes: >>If the only reason we are dicussing this, is that people find the >>magic pipe to select ugly, then I would even argue that we have not >>reached critical mass for even thinking. > >Well, there are a couple of problems with magical pipes: > >1. It means using 3 times as many file descriptors. (One for the original >socket, and 2 for the ends of the pipe). > >2. When working with the 3rd-party libraries, it requires a substantial rework of these >libraries. Getting a file decriptor from inside the library's implementation >and forcing it to close is a lot less invasive and can be done with a simple >API wrapper. What you're proposing to do in the kernel isn't any less complicated :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From des at des.no Tue Jul 8 11:16:52 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Tue Jul 8 11:16:59 2008 Subject: Proposal: a revoke() system call In-Reply-To: <7100389.65001215443113960.JavaMail.root@vms074.mailsrvcs.net> (Sergey Babkin's message of "Mon\, 07 Jul 2008 10\:05\:13 -0500 \(CDT\)") References: <7100389.65001215443113960.JavaMail.root@vms074.mailsrvcs.net> Message-ID: <86ej647ien.fsf@ds4.des.no> Sergey Babkin writes: > Robert Watson writes: > > Seems like that conflicts with our existing revoke(2) system call. > Aha, I guess when I've checked, I've looked at a real old version of > FreeBSD. "real old", as in "three years before FreeBSD even existed"? revoke(2) was introduced in 4.3BSD Reno in 1990. BTW, could you please switch to a MUA that correctly inserts In-Reply-To: and / or References: headers? DES -- Dag-Erling Sm?rgrav - des@des.no From babkin at verizon.net Tue Jul 8 12:28:06 2008 From: babkin at verizon.net (Sergey Babkin) Date: Tue Jul 8 12:28:12 2008 Subject: Proposal: a revoke() system call References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> Message-ID: <48735E52.65BE464B@verizon.net> Robert Watson wrote: > > On Mon, 7 Jul 2008, Sergey Babkin wrote: > > > This leaves a small race window between fd is checked and read() is > > executed. If in the meantime another thread does close() (and sets > > mystructure.fd to -1), and the third thread does open() then the result of > > this open would use the same fd number as our old fd (since now it's likely > > to be the lowest available number), then read() would happen on a completely > > wrong file. And yes, it does happen in real world. The best workaround I've > > come up with is a small pause between setting mystructure.fd = -1 and > > calling close(). > > > > The point of proposal is to do a close() without freeing the file > > descriptor. > > Which can be accomplished by calling dup2(2) to replace the file descriptor > with another file descriptor, perhaps one to /dev/null. It would be worth Yes, dup2() is certainly a better idea than a separate call. I've just assumed that David is following the discussion so far :-) -SB From ed at 80386.nl Tue Jul 8 14:16:21 2008 From: ed at 80386.nl (Ed Schouten) Date: Tue Jul 8 14:16:34 2008 Subject: MPSAFE TTY schedule - update In-Reply-To: <20080702190901.GS14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> Message-ID: <20080708141620.GG14567@hoeg.nl> Hello everyone, First of all, I am really impressed by the amount of people that have shown interest in helping me drive this project forward: - kris@ and some other people has been testing the patches. So far they found some small bugs, which should all be fixed now. Thanks! - marcel@ and nyan@ have already been working on uart(4). Last time I heard, they may have already gotten it working on certain pieces of PC98 hardware. - kan@ has committed a patch in the mpsafetty P4 branch to make dcons(4) working again. Thank you! I think I'll continue using this schedule, with some very small changes, based on previous discussions: * Ed Schouten wrote: > - Not all drivers have been ported to the new TTY layer yet. These > drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), > uftdi(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4), > umodem(4), dcons(4). This should now read: - Not all drivers have been ported to the new TTY layer yet. These drivers still need to be ported: sio(4), cy(4), digi(4), ubser(4), nmdm(4), ng_h4(4), ng_tty(4), snp(4), rp(4), rc(4), si(4). If time permits, I'll fix nmdm(4). I've also received some messages about si(4) and digi(4), so I'll contact those people to see what we can do here. Someone emailed me about ng_h4(4), ng_tty(4) and snp(4). There are no short term plans to make these drivers work. I am going to implement a new hooks interface into the TTY layer after we've integrated this patchset. I don't want to bring in too much code in a single run. > July 13 2008: > Make uart(4) the default serial port driver, instead of sio(4). > sio(4) has not been ported to the new TTY layer and is very hard > to do so. uart(4) has been proven to be more portable than > sio(4) and already supports the hardware we need. It looks like we can do this on i386 and amd64. pc98 still needs some polishing, but I've got full confidence this will be sorted out in time. I'll closely track marcel@'s work the next couple of days to make sure we can do this without breaking too much. > August 3 2008: > Disconnect drivers from the build that haven't been patched in > the MPSAFE TTY branch. I won't disconnect drivers here which are in the progress of being ported. If foo@ sends me an email to say he's working on rc(4) for example, I will leave that driver alone. Again, I should bug people with: > Please, make sure we can make this a smooth transition by > testing/reviewing my code. I tend to generate diffs very often. They can > be downloaded here: > > http://www.il.fontys.nl/~ed/projects/mpsafetty/patches/ Thanks! -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080708/a8651b76/attachment.pgp From babkin at verizon.net Tue Jul 8 14:45:50 2008 From: babkin at verizon.net (Sergey Babkin) Date: Tue Jul 8 14:45:56 2008 Subject: Proposal: a revoke() system call Message-ID: <14092723.330161215528323681.JavaMail.root@vms075.mailsrvcs.net> >From: =?ISO8859-1?Q?Dag-Erling_Sm=F8rgrav?= >Date: 2008/07/08 Tue AM 06:57:20 EDT >To: Sergey Babkin >Cc: Robert Watson , arch@freebsd.org >Subject: Re: Proposal: a revoke() system call >Sergey Babkin writes: >> Robert Watson writes: >> > Seems like that conflicts with our existing revoke(2) system call. >> Aha, I guess when I've checked, I've looked at a real old version of >> FreeBSD. > >"real old", as in "three years before FreeBSD even existed"? revoke(2) >was introduced in 4.3BSD Reno in 1990. I'm looking real stupid right now :-) Maybe I've misspelled it when looked first time. >BTW, could you please switch to a MUA that correctly inserts In-Reply-To: >and / or References: headers? That's my provider's web interface. -SB From rwatson at FreeBSD.org Tue Jul 8 15:26:23 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Tue Jul 8 15:26:41 2008 Subject: Proposal: a revoke() system call In-Reply-To: <20080708001929.E63144@fledge.watson.org> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> Message-ID: <20080708161802.N89342@fledge.watson.org> On Tue, 8 Jul 2008, Robert Watson wrote: > Which can be accomplished by calling dup2(2) to replace the file descriptor > with another file descriptor, perhaps one to /dev/null. It would be worth > carefully reviewing the implementation of dup2(2) to make sure that the > close->replace there is atomic with respect to other threads simultaneously > allocating file descriptors, such as with pipe(2). BTW, on a similar note to the above: I've noticed there are several spots of relative non-atomicity in the Linux emulation code, where rather than just wrapping existing system calls with binary conversion of arguments and return values, we do a semantic wrapping that is necessarily non-atomic with respect to the native code. For example, consider the Linuxulator open code in linux_common_open(): 134 error = kern_openat(td, dirfd, path, UIO_SYSSPACE, bsd_flags, mode); 135 136 if (!error) { 137 fd = td->td_retval[0]; 138 /* 139 * XXX In between kern_open() and fget(), another process 140 * having the same filedesc could use that fd without 141 * checking below. 142 */ 143 error = fget(td, fd, &fp); 144 if (!error) { 145 sx_slock(&proctree_lock); 146 PROC_LOCK(p); 147 if (!(bsd_flags & O_NOCTTY) && 148 SESS_LEADER(p) && !(p->p_flag & P_CONTROLT)) { 149 PROC_UNLOCK(p); 150 sx_unlock(&proctree_lock); 151 if (fp->f_type == DTYPE_VNODE) 152 (void) fo_ioctl(fp, TIOCSCTTY, (caddr_t) 0, 153 td->td_ucred, td); 154 } else { 155 PROC_UNLOCK(p); 156 sx_sunlock(&proctree_lock); 157 } 158 if (l_flags & LINUX_O_DIRECTORY) { 159 if (fp->f_type != DTYPE_VNODE || 160 fp->f_vnode->v_type != VDIR) { 161 error = ENOTDIR; 162 } 163 } 164 fdrop(fp, td); 165 /* 166 * XXX as above, fdrop()/kern_close() pair is racy. 167 */ 168 if (error) 169 kern_close(td, fd); 170 } 171 } I think that comment is mine, or at least, got there because of a comment I made to Roman or the like. The fd has not yet been explicitly returned to userspace, since the open system call hasn't actually returned, but other threads could use the file descriptor in a system call that could lead to unexpected races. For example, if you dup2() on top of the file descriptor between the return of kern_openat() and the invocation of fget(), fo_ioctl() might be called on the wrong file, or the kern_close() in the error case might get invoked on the "wrong" file descriptor. In these cases, the races are mostly harmless since they involve incorrectly using a file descriptor from a second thread -- since it hasn't been returned yet, it isn't valid yet and the results will be undefined. However, there may well be cases where similar races exist that do affect the semantics of multi-threaded Linux applications, such as having a main event (open()) and an associated event (fo_ioctl()) be non-atomic and allowing a race between them that does have a semantically problematic result. These sorts of edge cases, btw, are one reason why I would *strongly* discourage application writers from doing things like calling close(2) on a file descriptor while still using it from another thread. :-) Robert N M Watson Computer Laboratory University of Cambridge From ed at 80386.nl Tue Jul 8 15:36:33 2008 From: ed at 80386.nl (Ed Schouten) Date: Tue Jul 8 15:36:40 2008 Subject: Proposal: a revoke() system call In-Reply-To: <20080708161802.N89342@fledge.watson.org> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> <20080708161802.N89342@fledge.watson.org> Message-ID: <20080708153632.GI14567@hoeg.nl> Hello Robert, * Robert Watson wrote: > BTW, on a similar note to the above: I've noticed there are several spots > of relative non-atomicity in the Linux emulation code, where rather than > just wrapping existing system calls with binary conversion of arguments > and return values, we do a semantic wrapping that is necessarily > non-atomic with respect to the native code. For example, consider the > Linuxulator open code in linux_common_open(): I also noticed similar constructs inside the stat() calls, to translate device major/minor numbers. As you can see, some stat() routines call translate_path_major_minor_at() after performing the regular stat() operation. The translate_path_major_minor_at() is implemented by calling kern_openat(). This has three disadvantages: - It is non-atomic. - It can only perform the translation on nodes it has O_RDONLY access to. This shouldn't be a big problem, but may cause inconsistencies when users look around in devfs. - The translation may not always work when the calling process is out of file descriptors. Yours, -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080708/252ee0ce/attachment.pgp From das at FreeBSD.ORG Tue Jul 8 16:46:16 2008 From: das at FreeBSD.ORG (David Schultz) Date: Tue Jul 8 16:46:23 2008 Subject: Proposal: a revoke() system call In-Reply-To: <20080708161802.N89342@fledge.watson.org> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> <20080708161802.N89342@fledge.watson.org> Message-ID: <20080708164853.GA40704@zim.MIT.EDU> On Tue, Jul 08, 2008, Robert Watson wrote: > These sorts of edge cases, btw, are one reason why I would *strongly* > discourage application writers from doing things like calling close(2) on a > file descriptor while still using it from another thread. :-) My reaction is that apps should use standard concurrency control primitives, e.g., pthreads primitives or message queues, to coordinate the activities of multiple threads. The are scads of ways to introduce race conditions when updating various aspects of the process state (the fd table, in this case). Once we start adding special-purpose APIs to facilitate clever lock-free tricks in very specific cases, when will it stop? Next we'll want a special version of exit(), a special version of sigaction(), a special version of free(), and so forth. That said, POSIX does require open() and close() to be atomic, so the Linux emulation layer should be fixed in that regard: 2.9.7 Thread Interactions with Regular File Operations All of the functions chmod(), close(), fchmod(), fcntl(), fstat(), ftruncate(), lseek(), open(), read(), readlink(), stat(), symlink(), and write() shall be atomic with respect to each other in the effects specified in IEEE Std 1003.1-2001 when they operate on regular files. If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. From rwatson at FreeBSD.org Tue Jul 8 16:54:52 2008 From: rwatson at FreeBSD.org (Robert Watson) Date: Tue Jul 8 16:54:57 2008 Subject: Proposal: a revoke() system call In-Reply-To: <20080708153632.GI14567@hoeg.nl> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708001929.E63144@fledge.watson.org> <20080708161802.N89342@fledge.watson.org> <20080708153632.GI14567@hoeg.nl> Message-ID: <20080708174957.M41405@fledge.watson.org> On Tue, 8 Jul 2008, Ed Schouten wrote: > I also noticed similar constructs inside the stat() calls, to translate > device major/minor numbers. As you can see, some stat() routines call > translate_path_major_minor_at() after performing the regular stat() > operation. The translate_path_major_minor_at() is implemented by calling > kern_openat(). This has three disadvantages: > > - It is non-atomic. > > - It can only perform the translation on nodes it has O_RDONLY access > to. This shouldn't be a big problem, but may cause inconsistencies > when users look around in devfs. > > - The translation may not always work when the calling process is out of > file descriptors. - Opening a device node can have side effects, such as rewinding tapes, raising DTR on serial lines, triggering errors, or denying access to other consumers due to exclusive access requirements. Robert N M Watson Computer Laboratory University of Cambridge From stefan.lambrev at moneybookers.com Wed Jul 9 14:21:26 2008 From: stefan.lambrev at moneybookers.com (Stefan Lambrev) Date: Wed Jul 9 14:21:33 2008 Subject: Socket not ready problem. Message-ID: <4874C5C4.6080605@moneybookers.com> Greetings, I have few apps installed from ports (clamav-milter, spamassassin-milter and etc) which should chown/chmod their socket to allow apps running with different user to connect to them. The problem is that I see some race here. On SMP machine very often the app finish it's execution and next chown/chmod is called, but the socket is not opened/created at this point. Which is very annoying because I have to change my rc.d scripts by hand every time when I update/upgrade. And sometimes I forget ... Can you consider the following patch (or if you like something similar) for inclusion which will then allow rc.d shell script to call a function to wait for the socket and once ready to exit and let chown/chmod start. The function should be called with 2 parameters - socket path and timeout (and we can manage them from rc.conf) --- /usr/src/etc/rc.subr 2008-05-20 11:00:14.000000000 +0200 +++ /etc/rc.subr 2008-05-26 17:59:08.000000000 +0200 @@ -1569,4 +1569,22 @@ fi +wait_for_socket() +{ + _socketpath=$1 + _timeout=$2 + if [ -z "${_socketpath}" -o -z "${_timeout}" ]; then + err 3 'USAGE: wait_for_socket socketpath timeout' + fi + + while [ ${_timeout} -gt 0 ] + do + [ -S "${_socketpath}" ] && break + echo -n "." + sleep 1 + _timeout=$((${_timeout}-1)) + done + echo +} + _rc_subr_loaded=: -- Best Wishes, Stefan Lambrev ICQ# 24134177 From jhb at FreeBSD.org Thu Jul 10 02:25:13 2008 From: jhb at FreeBSD.org (John Baldwin) Date: Thu Jul 10 02:25:19 2008 Subject: Proposal: a revoke() system call In-Reply-To: <20080708164853.GA40704@zim.MIT.EDU> References: <9484951.340521215467447990.JavaMail.root@vms126.mailsrvcs.net> <20080708161802.N89342@fledge.watson.org> <20080708164853.GA40704@zim.MIT.EDU> Message-ID: <200807092054.48748.jhb@freebsd.org> On Tuesday 08 July 2008 12:48:53 pm David Schultz wrote: > On Tue, Jul 08, 2008, Robert Watson wrote: > > These sorts of edge cases, btw, are one reason why I would *strongly* > > discourage application writers from doing things like calling close(2) on a > > file descriptor while still using it from another thread. :-) > > My reaction is that apps should use standard concurrency control > primitives, e.g., pthreads primitives or message queues, to > coordinate the activities of multiple threads. The are scads of > ways to introduce race conditions when updating various aspects of > the process state (the fd table, in this case). Once we start > adding special-purpose APIs to facilitate clever lock-free tricks > in very specific cases, when will it stop? Next we'll want a > special version of exit(), a special version of sigaction(), a > special version of free(), and so forth. I agree, this just sounds like an application bug. Plus, even if we add a new system call that rescues drowning file descriptors it won't really help with writing a portable application anyway unless you get other OS's to adopt a similar API. Just use the extra pipe for messages and/or real locking (in your original example you have an obvious race with the use of 'mystructure' and the solution is Don't Do That(tm)). -- John Baldwin From sson at freebsd.org Thu Jul 10 06:22:02 2008 From: sson at freebsd.org (Stacey Son) Date: Thu Jul 10 06:22:31 2008 Subject: ksyms pseudo driver Message-ID: <4875A5D2.8030902@freebsd.org> Hi, I have created a ksyms pseudo driver for FreeBSD. Included below is the man page. The diff's to kernel source, the main source files, etc. can be found at: http://people.FreeBSD.org/~sson/ksyms/ The reason I created this driver is for dtrace and the port of the opensolaris lockstat(1M) command to FreeBSD. The ksyms driver allows a process to get a quick snapshot of the kernel symbol table including the symbols from any loaded modules. Unlike most other implementations, this ksyms driver maps memory in the process space to store the snapshot at the time /dev/ksyms is opened. It also checks to see if the process has already a snapshot open and won't allow it to open /dev/ksyms it again until it closes (and unmaps) its already opened snapshot first. Of course, this requires the read() handler to bounce the buffer into the kernel first before it is written back out to userspace. (Maybe there is a simple way to do an userspace to userspace copy instead?) The reason I went to all this trouble is to keep /dev/ksyms from turning into an easy way to exhaust all the kernel memory (unintentionally or intentionally). Let me know if you have any questions, comments, suggestions, and/or reasons why something like this should never be included in FreeBSD. Best Regards, -stacey. ----------------------------------------------------------------------------------- KSYMS(4) FreeBSD Kernel Interfaces Manual KSYMS(4) NAME ksyms -- kernel symbol table interface SYNOPSIS device ksyms DESCRIPTION The /dev/ksyms character device provides a read-only interface to a snap- shot of the kernel symbol table. The in-kernel symbol manager is designed to be able to handle many types of symbols tables, however, only elf(5) symbol tables are supported by this device. The ELF format image contains two sections: a symbol table and a corresponding string table. Symbol Table The SYMTAB section contains the symbol table entries present in the current running kernel, including the symbol table entries of any loaded modules. The symbols are ordered by the kernel module load time starting with kernel file symbols first, followed by the first loaded module's symbols and so on. String Table The STRTAB section contains the symbol name strings from the kernel and any loaded modules that the symbol table entries reference. Elf formatted symbol table data read from the /dev/ksyms file represents the state of the kernel at the time when the device is opened. Since /dev/ksyms has no text or data, most of the fields are initialized to NULL. The ksyms driver does not block the loading or unloading of mod- ules into the kernel while the /dev/ksyms file is open but may contain stale data. IOCTLS The ioctl(2) command codes below are defined in . The (third) argument to the ioctl(2) should be a pointer to the type indicated. KIOCGSIZE (size_t) Returns the total size of the current symbol table. This can be used when allocating a buffer to make a copy of the kernel symbol table. KIOCGADDR (void *) Returns the address of the kernel symbol table mapped in the process memory. FILES /dev/ksyms ERRORS An open(2) of /dev/ksyms will fail if: [EBUSY] The device is already open. A process must close /dev/ksyms before it can be opened again. [ENOMEM] There is a resource shortage in the kernel. [ENXIO] The driver was unsuccessful in creating a snapshot of the kernel symbol table. This may occur if the kernel was in the process of loading or unloading a module. SEE ALSO ioctl(2), nlist(3), elf(5), kldload(8) HISTORY A ksyms device exists in many different operating systems. This imple- mentation is similar in function to the Solaris and NetBSD ksyms driver. The ksyms driver first appeared in FreeBSD 8.0 to support lockstat(1). BUGS Because files can be dynamically linked into the kernel at any time the symbol information can vary. When you open the /dev/ksyms file, you have access to an ELF image which represents a snapshot of the state of the kernel symbol information at that instant in time. Keeping the device open does not block the loading or unloading of kernel modules. To get a new snapshot you must close and re-open the device. A process is only allowed to open the /dev/ksyms file once at a time. The process must close the /dev/ksyms before it is allowed to open it again. The ksyms driver uses the calling process' memory address space to store the snapshot. ioctl(2) can be used to get the memory address where the symbol table is stored to save kernel memory. mmap(2) may also be used but it will map it to another address. AUTHORS The ksyms driver was written by Stacey Son under the direction of John Birrell. FreeBSD 8.0 April 5, 2008 FreeBSD 8.0 From gallatin at cs.duke.edu Fri Jul 11 19:53:02 2008 From: gallatin at cs.duke.edu (Andrew Gallatin) Date: Fri Jul 11 19:53:09 2008 Subject: ksyms pseudo driver In-Reply-To: <4875A5D2.8030902@freebsd.org>; from sson@freebsd.org on Thu, Jul 10, 2008 at 01:01:31AM -0500 References: <4875A5D2.8030902@freebsd.org> Message-ID: <20080711155232.A96384@grasshopper.cs.duke.edu> Stacey Son [sson@freebsd.org] wrote: > > The reason I created this driver is for dtrace and the port of the > opensolaris lockstat(1M) command to FreeBSD. The ksyms driver allows a > process to get a quick > snapshot of the kernel symbol table including the symbols from any > loaded modules. Very cool! After doing some Solaris work, I've really missed lockstat! This would also be useful for hwpmc. > its already opened snapshot first. Of course, this requires the read() > handler to bounce the buffer into the kernel first before it is written > back out to userspace. (Maybe there is a simple way to do an userspace > to userspace copy instead?) The reason I went to all this trouble is to > keep /dev/ksyms from turning into an easy way to exhaust all the kernel > memory (unintentionally or intentionally). Instead of doing the copy in the kernel, can you just have a simple ioctl which returns the address and size of the snapshot? Then the userspace side can do the copy itself. Drew From sson at freebsd.org Sat Jul 12 01:16:04 2008 From: sson at freebsd.org (Stacey Son) Date: Sat Jul 12 01:16:11 2008 Subject: ksyms pseudo driver In-Reply-To: <20080711155232.A96384@grasshopper.cs.duke.edu> References: <4875A5D2.8030902@freebsd.org> <20080711155232.A96384@grasshopper.cs.duke.edu> Message-ID: <48780661.5050002@freebsd.org> Andrew Gallatin wrote: >> its already opened snapshot first. Of course, this requires the read() >> handler to bounce the buffer into the kernel first before it is written >> back out to userspace. (Maybe there is a simple way to do an userspace >> to userspace copy instead?) The reason I went to all this trouble is to >> keep /dev/ksyms from turning into an easy way to exhaust all the kernel >> memory (unintentionally or intentionally). >> > > Instead of doing the copy in the kernel, can you just have a simple > ioctl which returns the address and size of the snapshot? Then the > userspace side can do the copy itself. > Actually that is what the ioctls do now... You can just open /dev/ksyms to create the snapshot and then use ioctl() to get the size and address where the buffer is mapped. Or you can use mmap(). IOCTLS The ioctl(2) command codes below are defined in . The (third) argument to the ioctl(2) should be a pointer to the type indicated. KIOCGSIZE (size_t) Returns the total size of the current symbol table. KIOCGADDR (void *) Returns the address of the kernel symbol table mapped in the process memory. -stacey. From kostikbel at gmail.com Sat Jul 12 05:34:31 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Sat Jul 12 05:34:38 2008 Subject: ksyms pseudo driver In-Reply-To: <48780661.5050002@freebsd.org> References: <4875A5D2.8030902@freebsd.org> <20080711155232.A96384@grasshopper.cs.duke.edu> <48780661.5050002@freebsd.org> Message-ID: <20080712045837.GD17123@deviant.kiev.zoral.com.ua> On Fri, Jul 11, 2008 at 08:18:25PM -0500, Stacey Son wrote: > Andrew Gallatin wrote: > >>its already opened snapshot first. Of course, this requires the read() > >>handler to bounce the buffer into the kernel first before it is written > >>back out to userspace. (Maybe there is a simple way to do an userspace > >>to userspace copy instead?) The reason I went to all this trouble is to > >>keep /dev/ksyms from turning into an easy way to exhaust all the kernel > >>memory (unintentionally or intentionally). > >> > > > >Instead of doing the copy in the kernel, can you just have a simple > >ioctl which returns the address and size of the snapshot? Then the > >userspace side can do the copy itself. > > > Actually that is what the ioctls do now... You can just open > /dev/ksyms to create the snapshot and then use ioctl() to get the size > and address where the buffer is mapped. Or you can use mmap(). Most likely, I miss some obvious reason there. But for me it looks like you do it in the reverse. The natural setup would be to require userspace to supply an allocated memory to the driver, and then the driver fills the memory with symbol table. This solves the problem of exhaustion of kernel address space. As usual, when user-supplied region is too small, driver shall return both an error and new required size. It is understandable that the size is volatile and may be too small for the next call too. But, in fact, kernel symtable does not change too often, so I think even the one iteration mostly succeed. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080712/f7a0cae9/attachment.pgp From ed at 80386.nl Sun Jul 13 07:22:57 2008 From: ed at 80386.nl (Ed Schouten) Date: Sun Jul 13 07:23:04 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080702190901.GS14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> Message-ID: <20080713072254.GX14567@hoeg.nl> Hello all, * Ed Schouten wrote: > July 13 2008: > Make uart(4) the default serial port driver, instead of sio(4). > sio(4) has not been ported to the new TTY layer and is very hard > to do so. uart(4) has been proven to be more portable than > sio(4) and already supports the hardware we need. Just a small message to inform that I've just changed the default serial port driver on amd64 and i386 to uart(4) (see SVN commit 180487). I've decided to leave pc98 as it is now, because I'd rather let the respective maintainers look into this. Thanks! -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080713/1bd9b3de/attachment.pgp From sson at freebsd.org Mon Jul 14 04:20:29 2008 From: sson at freebsd.org (Stacey Son) Date: Mon Jul 14 04:20:39 2008 Subject: ksyms pseudo driver In-Reply-To: <20080712045837.GD17123@deviant.kiev.zoral.com.ua> References: <4875A5D2.8030902@freebsd.org> <20080711155232.A96384@grasshopper.cs.duke.edu> <48780661.5050002@freebsd.org> <20080712045837.GD17123@deviant.kiev.zoral.com.ua> Message-ID: <487AD49F.6040304@freebsd.org> Kostik Belousov wrote: > Most likely, I miss some obvious reason there. But for me it looks > like you do it in the reverse. The natural setup would be to require > userspace to supply an allocated memory to the driver, and then the > driver fills the memory with symbol table. This solves the problem of > exhaustion of kernel address space. > The snapshot of the consolidated symbol table is made when /dev/ksyms is opened. The storage for the snapshot is allocated in the memory map of the calling process. No kernel address space is used for the snapshot. A temporary buffer is allocated in kernel space in the read() handler (ksyms_read). Right now, for a read, it does two copies: one from user space to the temporary kernel space buffer and a second copy from the kernel space temp buffer and back out to user space. Ideally, it would be nice to do just one user space to user space copy directly in the kernel. > As usual, when user-supplied region is too small, driver shall return > both an error and new required size. It is understandable that the size > is volatile and may be too small for the next call too. But, in fact, > kernel symtable does not change too often, so I think even the one > iteration mostly succeed. > The reason the driver tries three times to create a valid snapshot is I couldn't figure out a way (without creating a lock reversal) to temporarily keep modules from being loaded or unloaded while the snapshot is created. I agree that it should be able to create the snapshot on the first iteration in most cases. BTW, you may have noticed the ksyms driver now uses your per-open file private data code which I like much better than using clone_create() for per-descriptor storage. Best Regards, -stacey. From bugmaster at FreeBSD.org Mon Jul 14 11:06:56 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 14 11:07:19 2008 Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org Message-ID: <200807141106.m6EB6tM2014358@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From kostikbel at gmail.com Tue Jul 15 09:34:10 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Jul 15 09:34:20 2008 Subject: ksyms pseudo driver In-Reply-To: <487AD49F.6040304@freebsd.org> References: <4875A5D2.8030902@freebsd.org> <20080711155232.A96384@grasshopper.cs.duke.edu> <48780661.5050002@freebsd.org> <20080712045837.GD17123@deviant.kiev.zoral.com.ua> <487AD49F.6040304@freebsd.org> Message-ID: <20080715093402.GO17123@deviant.kiev.zoral.com.ua> On Sun, Jul 13, 2008 at 11:22:55PM -0500, Stacey Son wrote: > Kostik Belousov wrote: > >Most likely, I miss some obvious reason there. But for me it looks > >like you do it in the reverse. The natural setup would be to require > >userspace to supply an allocated memory to the driver, and then the > >driver fills the memory with symbol table. This solves the problem of > >exhaustion of kernel address space. > > > > The snapshot of the consolidated symbol table is made when /dev/ksyms is > opened. The storage for the snapshot is allocated in the memory map of > the calling process. No kernel address space is used for the snapshot. Again, why this is done this way ? Why not creating snapshot when the user process issues ioctl that supplies neccessary usermode memory to the driver ? > > A temporary buffer is allocated in kernel space in the read() handler > (ksyms_read). Right now, for a read, it does two copies: one from > user space to the temporary kernel space buffer and a second copy from > the kernel space temp buffer and back out to user space. Ideally, it > would be nice to do just one user space to user space copy directly in > the kernel. > > >As usual, when user-supplied region is too small, driver shall return > >both an error and new required size. It is understandable that the size > >is volatile and may be too small for the next call too. But, in fact, > >kernel symtable does not change too often, so I think even the one > >iteration mostly succeed. > > > > The reason the driver tries three times to create a valid snapshot is I > couldn't figure out a way (without creating a lock reversal) to > temporarily keep modules from being loaded or unloaded while the > snapshot is created. I agree that it should be able to create the > snapshot on the first iteration in most cases. > > BTW, you may have noticed the ksyms driver now uses your per-open file > private data code which I like much better than using clone_create() for > per-descriptor storage. Does it work ? Do you have any suggestions for the KPI ? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080715/1c3319e6/attachment.pgp From sson at freebsd.org Tue Jul 15 13:11:53 2008 From: sson at freebsd.org (Stacey Son) Date: Tue Jul 15 13:11:59 2008 Subject: ksyms pseudo driver In-Reply-To: <20080715093402.GO17123@deviant.kiev.zoral.com.ua> References: <4875A5D2.8030902@freebsd.org> <20080711155232.A96384@grasshopper.cs.duke.edu> <48780661.5050002@freebsd.org> <20080712045837.GD17123@deviant.kiev.zoral.com.ua> <487AD49F.6040304@freebsd.org> <20080715093402.GO17123@deviant.kiev.zoral.com.ua> Message-ID: <487CA2B4.7070604@freebsd.org> Kostik Belousov wrote: >> The snapshot of the consolidated symbol table is made when /dev/ksyms is >> opened. The storage for the snapshot is allocated in the memory map of >> the calling process. No kernel address space is used for the snapshot. >> > Again, why this is done this way ? Why not creating snapshot when the > user process issues ioctl that supplies neccessary usermode memory > to the driver ? > The main reason it is written as a pseudo driver is so it can be used with standard command-line utilities. For example, see the ksyms example in the dtrace manual (http://wikis.sun.com/display/DTrace/Structs+and+Unions). I guess it could still be possible to do in the way you are suggesting but it would require a special 'cat', or something, to allocate the user space buffer and then pass that in driver before it starts reading the symbol table. You could then pipe the output of the "special ksyms cat" to the actual command-line program you wanted to use. Of course, if you had to use a "special ksyms cat" then there would be no reason to make this a pseudo driver. You could simply make it a system call and eliminate a lot of code and calls into the kernel. >> BTW, you may have noticed the ksyms driver now uses your per-open file >> private data code which I like much better than using clone_create() for >> per-descriptor storage. >> > Does it work ? Do you have any suggestions for the KPI ? > Yes, it seems to work much better than the previous method (clone_create) but more testing is needed. I was having problems with the clone_create() method when I was running some testing code that would rapidly open /dev/ksyms. open() would fail. I am guessing there may be a race condition between when the device is cloned and actually open'ed. I'll let you know if I have some suggestions for the KPI. -stacey. From kostikbel at gmail.com Tue Jul 15 13:18:11 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Jul 15 13:18:18 2008 Subject: ksyms pseudo driver In-Reply-To: <487CA2B4.7070604@freebsd.org> References: <4875A5D2.8030902@freebsd.org> <20080711155232.A96384@grasshopper.cs.duke.edu> <48780661.5050002@freebsd.org> <20080712045837.GD17123@deviant.kiev.zoral.com.ua> <487AD49F.6040304@freebsd.org> <20080715093402.GO17123@deviant.kiev.zoral.com.ua> <487CA2B4.7070604@freebsd.org> Message-ID: <20080715131801.GS17123@deviant.kiev.zoral.com.ua> On Tue, Jul 15, 2008 at 08:14:28AM -0500, Stacey Son wrote: > Kostik Belousov wrote: > >>The snapshot of the consolidated symbol table is made when /dev/ksyms is > >>opened. The storage for the snapshot is allocated in the memory map of > >>the calling process. No kernel address space is used for the snapshot. > >> > >Again, why this is done this way ? Why not creating snapshot when the > >user process issues ioctl that supplies neccessary usermode memory > >to the driver ? > > > > The main reason it is written as a pseudo driver is so it can be used > with standard command-line utilities. For example, see the ksyms > example in the dtrace manual > (http://wikis.sun.com/display/DTrace/Structs+and+Unions). I guess it > could still be possible to do in the way you are suggesting but it would > require a special 'cat', or something, to allocate the user space buffer > and then pass that in driver before it starts reading the symbol table. > You could then pipe the output of the "special ksyms cat" to the actual > command-line program you wanted to use. Of course, if you had to use > a "special ksyms cat" then there would be no reason to make this a > pseudo driver. You could simply make it a system call and eliminate a > lot of code and calls into the kernel. Would dd bs= work as the "special cat" ? procfs' /proc/pid/map has the similar problem, and there was a procmap program in ports. I believe dd is enough. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080715/6d8ec561/attachment.pgp From gallatin at cs.duke.edu Tue Jul 15 15:53:29 2008 From: gallatin at cs.duke.edu (Andrew Gallatin) Date: Tue Jul 15 15:53:44 2008 Subject: ksyms pseudo driver In-Reply-To: <487CA2B4.7070604@freebsd.org> References: <4875A5D2.8030902@freebsd.org> <20080711155232.A96384@grasshopper.cs.duke.edu> <48780661.5050002@freebsd.org> <20080712045837.GD17123@deviant.kiev.zoral.com.ua> <487AD49F.6040304@freebsd.org> <20080715093402.GO17123@deviant.kiev.zoral.com.ua> <487CA2B4.7070604@freebsd.org> Message-ID: <487CC7EC.5060100@cs.duke.edu> Stacey Son wrote: > The main reason it is written as a pseudo driver is so it can be used > with standard command-line utilities. For example, see the ksyms Ah, now everything is perfectly clear to me. Your method is very clever indeed. Just out of curiosity, how much memory will the entire symbol + strings table require? How often do typical consumers (like dtrace) request them? Drew From sson at freebsd.org Tue Jul 15 18:25:38 2008 From: sson at freebsd.org (Stacey Son) Date: Tue Jul 15 18:25:43 2008 Subject: ksyms pseudo driver In-Reply-To: <487CC7EC.5060100@cs.duke.edu> References: <4875A5D2.8030902@freebsd.org> <20080711155232.A96384@grasshopper.cs.duke.edu> <48780661.5050002@freebsd.org> <20080712045837.GD17123@deviant.kiev.zoral.com.ua> <487AD49F.6040304@freebsd.org> <20080715093402.GO17123@deviant.kiev.zoral.com.ua> <487CA2B4.7070604@freebsd.org> <487CC7EC.5060100@cs.duke.edu> Message-ID: <487CEC3E.3060204@freebsd.org> Andrew Gallatin wrote: > Ah, now everything is perfectly clear to me. Your method is > very clever indeed. > > Just out of curiosity, how much memory will the entire symbol > + strings table require? How often do typical consumers (like dtrace) > request them? On an AMD64 "Generic" kernel with only the ksyms module loaded it is 1523847 bytes. lockstat(1M) will open and read /dev/ksyms once each time it is invoked. For dtrace, it depends on the script but there shouldn't be any reason why it reads it more than once as well. -stacey. From jroberson at jroberson.net Sat Jul 19 03:07:08 2008 From: jroberson at jroberson.net (Jeff Roberson) Date: Sat Jul 19 03:07:13 2008 Subject: witness performance improvements Message-ID: <20080718163231.B954@desktop> Hello, I have a patch that improves witness performance available at: http://people.freebsd.org/~jeff/witness.diff This improvement comes at the cost of some significant space overhead. It changes the witness graph from a linked tree to a matrix based approach. Relationships can be quickly resolved with a table lookup. The table size is WITNESS_COUNT^2, or 1MB with the current count of 1024. This patch also makes struct witness objects persistent even after the last lock using this name has been removed. This is helpful for short lived objects which may be created frequently. To reduce lock contention on SMP witness_checkorder() now runs without the w_mtx when there are no lock violations. I also cache a lock_list_entry in each thread as allocating these requires the w_mtx. The entry is disposed of at thread_exit(). There is also a new sysctl that produces dot output which graphs lock order relationships with the graphviz program. Most of this work was done by Ilya Maykov while he was at Isilon systems. The locking work and some cleanup/porting/refinement was done by me on behalf of Nokia. The performance improvement can be significant. It is only on the order of 10-20% for buildkernel but on a packet forwarding test at nokia it sped things up by 5x putting a witness enabled kernel within about 50% of the performance of a kernel without. I believe buildworld isn't helped as much because forking and exiting a lot would then contend on the witness lock. I'm mostly interested in hearing what people have to say about the space bloat. I believe it is in a commit ready state. Thanks, Jeff From julian at elischer.org Sat Jul 19 07:42:57 2008 From: julian at elischer.org (Julian Elischer) Date: Sat Jul 19 07:43:03 2008 Subject: witness performance improvements In-Reply-To: <20080718163231.B954@desktop> References: <20080718163231.B954@desktop> Message-ID: <48819885.7040901@elischer.org> Jeff Roberson wrote: > Hello, > > I have a patch that improves witness performance available at: > > http://people.freebsd.org/~jeff/witness.diff > > This improvement comes at the cost of some significant space overhead. > It changes the witness graph from a linked tree to a matrix based > approach. Relationships can be quickly resolved with a table lookup. > The table size is WITNESS_COUNT^2, or 1MB with the current count of 1024. > > This patch also makes struct witness objects persistent even after the > last lock using this name has been removed. This is helpful for short > lived objects which may be created frequently. > > To reduce lock contention on SMP witness_checkorder() now runs without > the w_mtx when there are no lock violations. I also cache a > lock_list_entry in each thread as allocating these requires the w_mtx. > The entry is disposed of at thread_exit(). > > There is also a new sysctl that produces dot output which graphs lock > order relationships with the graphviz program. > cool... got sample output? > Most of this work was done by Ilya Maykov while he was at Isilon > systems. The locking work and some cleanup/porting/refinement was done > by me on behalf of Nokia. > > The performance improvement can be significant. It is only on the order > of 10-20% for buildkernel but on a packet forwarding test at nokia it > sped things up by 5x putting a witness enabled kernel within about 50% > of the performance of a kernel without. I believe buildworld isn't > helped as much because forking and exiting a lot would then contend on > the witness lock. > > I'm mostly interested in hearing what people have to say about the space > bloat. I believe it is in a commit ready state. Since witness is not usually on production systems, I don't see a problem with giving it 1 MB. kjfnb7 > > Thanks, > Jeff > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From attilio at freebsd.org Sat Jul 19 12:55:09 2008 From: attilio at freebsd.org (Attilio Rao) Date: Sat Jul 19 12:55:16 2008 Subject: witness performance improvements In-Reply-To: <20080718163231.B954@desktop> References: <20080718163231.B954@desktop> Message-ID: <3bbf2fe10807190525y65facf80uad2a974619198186@mail.gmail.com> 2008/7/19, Jeff Roberson : > Hello, > > I have a patch that improves witness performance available at: > > http://people.freebsd.org/~jeff/witness.diff > > This improvement comes at the cost of some significant space overhead. It > changes the witness graph from a linked tree to a matrix based approach. > Relationships can be quickly resolved with a table lookup. The table size > is WITNESS_COUNT^2, or 1MB with the current count of 1024. > > This patch also makes struct witness objects persistent even after the last > lock using this name has been removed. This is helpful for short lived > objects which may be created frequently. > > To reduce lock contention on SMP witness_checkorder() now runs without the > w_mtx when there are no lock violations. I also cache a lock_list_entry in > each thread as allocating these requires the w_mtx. The entry is disposed > of at thread_exit(). > > There is also a new sysctl that produces dot output which graphs lock order > relationships with the graphviz program. As I alredy said, I don't like this. I mostly prefer the current approach (comma separated stuff) that one can shape as its need. If you also think there are some informations the current sysctl doesn't export and it should we could fix it, but IMHO we should axe this part of the patch (I have still to look at this patch, but I remind in the Isilon's version it was a good amount of structures and modifies just to handle that part). > Most of this work was done by Ilya Maykov while he was at Isilon systems. > The locking work and some cleanup/porting/refinement was done by me on > behalf of Nokia. > > The performance improvement can be significant. It is only on the order of > 10-20% for buildkernel but on a packet forwarding test at nokia it sped > things up by 5x putting a witness enabled kernel within about 50% of the > performance of a kernel without. I believe buildworld isn't helped as much > because forking and exiting a lot would then contend on the witness lock. > > I'm mostly interested in hearing what people have to say about the space > bloat. I believe it is in a commit ready state. This should not be a big problem, it is a debugging kernel after all if it has WITNESS. I hope I will have more time for a detailed revision in the day. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From lme at FreeBSD.org Sat Jul 19 14:12:38 2008 From: lme at FreeBSD.org (Lars Engels) Date: Sat Jul 19 14:12:45 2008 Subject: witness performance improvements In-Reply-To: <20080718163231.B954@desktop> References: <20080718163231.B954@desktop> Message-ID: <20080719141236.GF56464@e.0x20.net> On Fri, Jul 18, 2008 at 04:41:58PM -1000, Jeff Roberson wrote: > Hello, > > I have a patch that improves witness performance available at: > > http://people.freebsd.org/~jeff/witness.diff > > This improvement comes at the cost of some significant space overhead. > It changes the witness graph from a linked tree to a matrix based > approach. Relationships can be quickly resolved with a table lookup. > The table size is WITNESS_COUNT^2, or 1MB with the current count of > 1024. > > This patch also makes struct witness objects persistent even after the > last lock using this name has been removed. This is helpful for short > lived objects which may be created frequently. > > To reduce lock contention on SMP witness_checkorder() now runs without > the w_mtx when there are no lock violations. I also cache a > lock_list_entry in each thread as allocating these requires the w_mtx. > The entry is disposed of at thread_exit(). > > There is also a new sysctl that produces dot output which graphs lock > order relationships with the graphviz program. > > Most of this work was done by Ilya Maykov while he was at Isilon > systems. The locking work and some cleanup/porting/refinement was done > by me on behalf of Nokia. > > The performance improvement can be significant. It is only on the > order of 10-20% for buildkernel but on a packet forwarding test at > nokia it sped things up by 5x putting a witness enabled kernel within > about 50% of the performance of a kernel without. I believe > buildworld isn't helped as much because forking and exiting a lot > would then contend on the witness lock. > > I'm mostly interested in hearing what people have to say about the > space bloat. I believe it is in a commit ready state. > > Thanks, > Jeff The speed improvement is significant here (Core Duo Machine). The kernel build time went from ~8:30 min to 5:30 min. But when I run sysctl -a the kernel panics. kgdb output: Unread portion of the kernel message buffer: panic: Assertion mcount == fcount failed at /usr/src/sys/kern/subr_witness.c:2882 cpuid = 1 KDB: enter: panic panic: from debugger cpuid = 1 KDB: stack backtrace: Physical memory: 1002 MB Dumping 68 MB: 53 37 21 5 (kgdb) bt #0 doadump () at pcpu.h:196 #1 0xc0476c69 in db_fncall (dummy1=-1061959744, dummy2=0, dummy3=3, dummy4=0xe54ce8ec "") at /usr/src/sys/ddb/db_command.c:516 #2 0xc0477218 in db_command (last_cmdp=0xc099a9b0, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:413 #3 0xc047734a in db_command_loop () at /usr/src/sys/ddb/db_command.c:466 #4 0xc0478b3d in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228 #5 0xc0618d56 in kdb_trap (type=3, code=0, tf=0xe54cea94) at /usr/src/sys/kern/subr_kdb.c:534 #6 0xc087ec36 in trap (frame=0xe54cea94) at /usr/src/sys/i386/i386/trap.c:683 #7 0xc08634bb in calltrap () at /usr/src/sys/i386/i386/exception.s:165 #8 0xc0618eda in kdb_enter (why=0xc08dc49c "panic", msg=0xc08dc49c "panic") at cpufunc.h:60 #9 0xc05ebe9c in panic (fmt=0xc08d72ed "Assertion %s failed at %s:%d") at /usr/src/sys/kern/kern_shutdown.c:556 #10 0xc062bf62 in sysctl_debug_witness_cyclegraph (oidp=0xc096bec0, arg1=0x0, arg2=0, req=0xe54ceba4) at /usr/src/sys/kern/subr_witness.c:2882 #11 0xc05f59d7 in sysctl_root (oidp=Variable "oidp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:1325 #12 0xc05f5b05 in userland_sysctl (td=0xc49c08c0, name=0xe54cec10, namelen=3, old=0x0, oldlenp=0xbfbfd9b0, inkernel=0, new=0x0, newlen=0, retval=0xe54cec70, flags=0) at /usr/src/sys/kern/kern_sysctl.c:1420 #13 0xc05f5f4c in __sysctl (td=0xc49c08c0, uap=0xe54cecf8) at /usr/src/sys/kern/kern_sysctl.c:1355 #14 0xc087e3b3 in syscall (frame=0xe54ced38) at /usr/src/sys/i386/i386/trap.c:1081 #15 0xc0863520 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261 #16 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) The kernel sources are one week old. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080719/9cb5d78a/attachment.pgp From xcllnt at mac.com Sat Jul 19 17:59:31 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Sat Jul 19 17:59:37 2008 Subject: RFC: cross-libkvm/libthread_db/proc_service Message-ID: <34889018-8358-46AC-897E-32767FB84E14@mac.com> All, We have a couple of interfaces/APIs that can't be used cross-platform. Take for example libkvm. On a 32-bit platform, we can't typically use libkvm on a 64-kernel, because the libkvm interface uses u_long for the target address, which on 32-bit platforms is 32 bits wide. Likewise, libthread_db and proc_service are designed for native use only and need API tweaks to work in a cross-environment. Both use psaddr_t to represent a target address, which is defined as a void* in . I'd like to change those interfaces/APIs to allow them to be used in a cross-platform debugging environment. Basically, this means that a target address will have to be defined as a uint64_t. Other datatypes may also need to be retyped. For libkvm in particular I don't want to redefine struct kinfo_proc, struct nlist, etc. While it could be useful in a hybrid 32/64-bit environment, the effect of such changes have too high a chance to trickle down various other components/interfaces. Thus, for libkvm the focus is on kvm_read() and kvm_write(). Suggested plan of attack: o add kvm_xread() and kvm_xwrite() to the libkvm API to minimize the overall impact. The new functions operate on a 64-bit target address (psaddr_t). o change psaddr_t from a void* to a 64-bit integral (sys/procfs.h) This affects proc_service and libthread_db. And consequently our threading support in GDB. Comments/thoughts? -- Marcel Moolenaar xcllnt@mac.com From xcllnt at mac.com Sat Jul 19 18:42:47 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Sat Jul 19 18:42:54 2008 Subject: RFC: cross-libkvm/libthread_db/proc_service In-Reply-To: <20080719183725.GM17123@deviant.kiev.zoral.com.ua> References: <34889018-8358-46AC-897E-32767FB84E14@mac.com> <20080719183725.GM17123@deviant.kiev.zoral.com.ua> Message-ID: <6EA6C2B0-EF45-4A65-A455-65700BA6B024@mac.com> On Jul 19, 2008, at 11:37 AM, Kostik Belousov wrote: > On Sat, Jul 19, 2008 at 10:59:29AM -0700, Marcel Moolenaar wrote: >> All, >> >> We have a couple of interfaces/APIs that can't be used cross- >> platform. >> >> Take for example libkvm. On a 32-bit platform, we can't typically use >> libkvm on a 64-kernel, because the libkvm interface uses u_long for >> the target address, which on 32-bit platforms is 32 bits wide. >> >> Likewise, libthread_db and proc_service are designed for native use >> only and need API tweaks to work in a cross-environment. Both use >> psaddr_t to represent a target address, which is defined as a void* >> in . >> >> I'd like to change those interfaces/APIs to allow them to be used in >> a cross-platform debugging environment. Basically, this means that a >> target address will have to be defined as a uint64_t. Other datatypes >> may also need to be retyped. >> >> For libkvm in particular I don't want to redefine struct kinfo_proc, >> struct nlist, etc. While it could be useful in a hybrid 32/64-bit >> environment, the effect of such changes have too high a chance to >> trickle down various other components/interfaces. Thus, for libkvm >> the focus is on kvm_read() and kvm_write(). >> >> Suggested plan of attack: >> o add kvm_xread() and kvm_xwrite() to the libkvm API to minimize >> the overall impact. The new functions operate on a 64-bit target >> address (psaddr_t). >> o change psaddr_t from a void* to a 64-bit integral (sys/procfs.h) >> This affects proc_service and libthread_db. And consequently our >> threading support in GDB. >> >> Comments/thoughts? > > I do not object to the idea, but thing to consider is the backward > compatibility. In other words, how much harder would it be to run, > e.g., an RELENG_7 jail on the current kernel after the change ? The impact on the kernel is exactly nil. psaddr_t is not used by the kernel or its interfaces. It's only used for debug related functionality. FYI, -- Marcel Moolenaar xcllnt@mac.com From kostikbel at gmail.com Sat Jul 19 18:48:13 2008 From: kostikbel at gmail.com (Kostik Belousov) Date: Sat Jul 19 18:48:20 2008 Subject: RFC: cross-libkvm/libthread_db/proc_service In-Reply-To: <34889018-8358-46AC-897E-32767FB84E14@mac.com> References: <34889018-8358-46AC-897E-32767FB84E14@mac.com> Message-ID: <20080719183725.GM17123@deviant.kiev.zoral.com.ua> On Sat, Jul 19, 2008 at 10:59:29AM -0700, Marcel Moolenaar wrote: > All, > > We have a couple of interfaces/APIs that can't be used cross-platform. > > Take for example libkvm. On a 32-bit platform, we can't typically use > libkvm on a 64-kernel, because the libkvm interface uses u_long for > the target address, which on 32-bit platforms is 32 bits wide. > > Likewise, libthread_db and proc_service are designed for native use > only and need API tweaks to work in a cross-environment. Both use > psaddr_t to represent a target address, which is defined as a void* > in . > > I'd like to change those interfaces/APIs to allow them to be used in > a cross-platform debugging environment. Basically, this means that a > target address will have to be defined as a uint64_t. Other datatypes > may also need to be retyped. > > For libkvm in particular I don't want to redefine struct kinfo_proc, > struct nlist, etc. While it could be useful in a hybrid 32/64-bit > environment, the effect of such changes have too high a chance to > trickle down various other components/interfaces. Thus, for libkvm > the focus is on kvm_read() and kvm_write(). > > Suggested plan of attack: > o add kvm_xread() and kvm_xwrite() to the libkvm API to minimize > the overall impact. The new functions operate on a 64-bit target > address (psaddr_t). > o change psaddr_t from a void* to a 64-bit integral (sys/procfs.h) > This affects proc_service and libthread_db. And consequently our > threading support in GDB. > > Comments/thoughts? I do not object to the idea, but thing to consider is the backward compatibility. In other words, how much harder would it be to run, e.g., an RELENG_7 jail on the current kernel after the change ? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080719/e1ceb4dc/attachment.pgp From jroberson at jroberson.net Sun Jul 20 09:33:12 2008 From: jroberson at jroberson.net (Jeff Roberson) Date: Sun Jul 20 09:33:18 2008 Subject: witness performance improvements In-Reply-To: <3bbf2fe10807190525y65facf80uad2a974619198186@mail.gmail.com> References: <20080718163231.B954@desktop> <3bbf2fe10807190525y65facf80uad2a974619198186@mail.gmail.com> Message-ID: <20080719233219.O954@desktop> On Sat, 19 Jul 2008, Attilio Rao wrote: > 2008/7/19, Jeff Roberson : >> Hello, >> >> I have a patch that improves witness performance available at: >> >> http://people.freebsd.org/~jeff/witness.diff >> >> This improvement comes at the cost of some significant space overhead. It >> changes the witness graph from a linked tree to a matrix based approach. >> Relationships can be quickly resolved with a table lookup. The table size >> is WITNESS_COUNT^2, or 1MB with the current count of 1024. >> >> This patch also makes struct witness objects persistent even after the last >> lock using this name has been removed. This is helpful for short lived >> objects which may be created frequently. >> >> To reduce lock contention on SMP witness_checkorder() now runs without the >> w_mtx when there are no lock violations. I also cache a lock_list_entry in >> each thread as allocating these requires the w_mtx. The entry is disposed >> of at thread_exit(). >> >> There is also a new sysctl that produces dot output which graphs lock order >> relationships with the graphviz program. > > As I alredy said, I don't like this. > I mostly prefer the current approach (comma separated stuff) that one > can shape as its need. > If you also think there are some informations the current sysctl > doesn't export and it should we could fix it, but IMHO we should axe > this part of the patch (I have still to look at this patch, but I > remind in the Isilon's version it was a good amount of structures and > modifies just to handle that part). Can you estimate how much effort it would take to port your previous graph solution to the current witness code? > >> Most of this work was done by Ilya Maykov while he was at Isilon systems. >> The locking work and some cleanup/porting/refinement was done by me on >> behalf of Nokia. >> >> The performance improvement can be significant. It is only on the order of >> 10-20% for buildkernel but on a packet forwarding test at nokia it sped >> things up by 5x putting a witness enabled kernel within about 50% of the >> performance of a kernel without. I believe buildworld isn't helped as much >> because forking and exiting a lot would then contend on the witness lock. >> >> I'm mostly interested in hearing what people have to say about the space >> bloat. I believe it is in a commit ready state. > > This should not be a big problem, it is a debugging kernel after all > if it has WITNESS. > > I hope I will have more time for a detailed revision in the day. I would appreciate that. Thanks, Jeff > > Thanks, > Attilio > > > -- > Peace can only be achieved by understanding - A. Einstein > From ed at 80386.nl Sun Jul 20 12:34:57 2008 From: ed at 80386.nl (Ed Schouten) Date: Sun Jul 20 12:35:04 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080702190901.GS14567@hoeg.nl> References: <20080702190901.GS14567@hoeg.nl> Message-ID: <20080720123256.GE21188@hoeg.nl> Hello everyone, Today is July 20, which means I'm supposed to send you a message: * Ed Schouten wrote: > July 20 2008: > Send another heads-up to the lists about the new TTY layer. > Kindly ask people to test the patchset, port more drivers, etc. As usual, the latest mpsafetty patchset can be found here. I would really appreciate it if I could get more reviews on the code. Thanks! http://www.il.fontys.nl/~ed/projects/mpsafetty/patches/ The following drivers have not been ported to the new TTY layer yet: cy(4), digi(4), ng_h4(4), ng_tty(4), nmdm(4), rc(4), rp(4), si(4), sio(4), snp(4), ubser(4). I've been working on nmdm(4). I'll probably get it working in time. If not it will be fixed not long after the integration next month. The line disciplines like snp(4), ng_tty(4) and ng_h4(4) can only be fixed after the import, because the hooks layer will be written after the import. In the other news: kris@ reported a possible performance regression to me. He discovered `make -C /usr/ports index' consumed more system time on his hardware when the mpsafetty patches were applied. For some reason, I'm not capable of reproducing them. I even experience a performance gain when running mpsafetty, which is quite plausible, because I've also made some small improvements to `struct session' locking, but we also don't pick up Giant in kern_proc.c anymore. Because kris@ committed a patch to improve `make index' performance yesterday, I re-ran my tests today, showing the performance difference is now nihil. Here are the raw numbers: http://80386.nl/files/mpsafetty-stats.txt Maybe someone is interested in performing more thorough benchmarks? Yours, -- Ed Schouten WWW: http://80386.nl/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080720/d5ed41ea/attachment.pgp From keramida at ceid.upatras.gr Sun Jul 20 18:46:25 2008 From: keramida at ceid.upatras.gr (Giorgos Keramidas) Date: Sun Jul 20 18:46:32 2008 Subject: MPSAFE TTY schedule In-Reply-To: <20080720123256.GE21188@hoeg.nl> (Ed Schouten's message of "Sun, 20 Jul 2008 14:32:56 +0200") References: <20080702190901.GS14567@hoeg.nl> <20080720123256.GE21188@hoeg.nl> Message-ID: <87wsjg2yo6.fsf@kobe.laptop> On Sun, 20 Jul 2008 14:32:56 +0200, Ed Schouten wrote: > Hello everyone, > > Today is July 20, which means I'm supposed to send you a message: > > * Ed Schouten wrote: >> July 20 2008: >> Send another heads-up to the lists about the new TTY layer. >> Kindly ask people to test the patchset, port more drivers, etc. > > As usual, the latest mpsafetty patchset can be found here. I would > really appreciate it if I could get more reviews on the code. Thanks! > > http://www.il.fontys.nl/~ed/projects/mpsafetty/patches/ Hi Ed, I see the latest patch at: http://www.il.fontys.nl/~ed/projects/mpsafetty/patches/mpsafetty-20080720.diff.gz Kris has mentioned that it breaks tcsh's autodetection for ptys (we've seen that before when /dev/pts kern.pts.enable=1 was added), so I'd like to build a test kernel+world with the patch to check this. Anything I should be careful about? From bugmaster at FreeBSD.org Mon Jul 21 11:06:53 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 21 11:07:13 2008 Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org Message-ID: <200807211106.m6LB6qts031812@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From jhb at freebsd.org Mon Jul 21 20:55:49 2008 From: jhb at freebsd.org (John Baldwin) Date: Mon Jul 21 20:55:55 2008 Subject: RFC: cross-libkvm/libthread_db/proc_service In-Reply-To: <34889018-8358-46AC-897E-32767FB84E14@mac.com> References: <34889018-8358-46AC-897E-32767FB84E14@mac.com> Message-ID: <200807211049.47579.jhb@freebsd.org> On Saturday 19 July 2008 01:59:29 pm Marcel Moolenaar wrote: > All, > > We have a couple of interfaces/APIs that can't be used cross-platform. > > Take for example libkvm. On a 32-bit platform, we can't typically use > libkvm on a 64-kernel, because the libkvm interface uses u_long for > the target address, which on 32-bit platforms is 32 bits wide. > > Likewise, libthread_db and proc_service are designed for native use > only and need API tweaks to work in a cross-environment. Both use > psaddr_t to represent a target address, which is defined as a void* > in . > > I'd like to change those interfaces/APIs to allow them to be used in > a cross-platform debugging environment. Basically, this means that a > target address will have to be defined as a uint64_t. Other datatypes > may also need to be retyped. > > For libkvm in particular I don't want to redefine struct kinfo_proc, > struct nlist, etc. While it could be useful in a hybrid 32/64-bit > environment, the effect of such changes have too high a chance to > trickle down various other components/interfaces. Thus, for libkvm > the focus is on kvm_read() and kvm_write(). > > Suggested plan of attack: > o add kvm_xread() and kvm_xwrite() to the libkvm API to minimize > the overall impact. The new functions operate on a 64-bit target > address (psaddr_t). > o change psaddr_t from a void* to a 64-bit integral (sys/procfs.h) > This affects proc_service and libthread_db. And consequently our > threading support in GDB. > > Comments/thoughts? I think this is ok. However, can't you just make newer (1.1) versions of kvm_read/write instead of adding a new API? -- John Baldwin From jhb at freebsd.org Mon Jul 21 20:56:12 2008 From: jhb at freebsd.org (John Baldwin) Date: Mon Jul 21 20:56:27 2008 Subject: witness performance improvements In-Reply-To: <20080718163231.B954@desktop> References: <20080718163231.B954@desktop> Message-ID: <200807211141.09387.jhb@freebsd.org> On Friday 18 July 2008 10:41:58 pm Jeff Roberson wrote: > Hello, > > I have a patch that improves witness performance available at: > > http://people.freebsd.org/~jeff/witness.diff > > This improvement comes at the cost of some significant space overhead. It > changes the witness graph from a linked tree to a matrix based approach. > Relationships can be quickly resolved with a table lookup. The table size > is WITNESS_COUNT^2, or 1MB with the current count of 1024. Woo! Thanks for polishing this. > This patch also makes struct witness objects persistent even after the > last lock using this name has been removed. This is helpful for short > lived objects which may be created frequently. Originally, the idea was that if one had a LOR bug in a driver, one could kldunload the driver and have WITNESS forget about any orders for the driver's lock, fix the bug, and try again, but the short-lived names problem is much more common in practice, and trying to remove info about a specific lock class from the graph is a bit tenuous, so I think this is the better approach going forward. > To reduce lock contention on SMP witness_checkorder() now runs without the > w_mtx when there are no lock violations. I also cache a lock_list_entry > in each thread as allocating these requires the w_mtx. The entry is > disposed of at thread_exit(). Neat. > I'm mostly interested in hearing what people have to say about the space > bloat. I believe it is in a commit ready state. I think the space usage is perfectly fine. Also, now that you malloc the actual witness objects instead of putting them in the BSS (something that should have been done anyway I think), I would make the number of witness objects a loader tunable. -- John Baldwin From eischen at vigrid.com Mon Jul 21 21:11:36 2008 From: eischen at vigrid.com (Daniel Eischen) Date: Mon Jul 21 21:11:43 2008 Subject: RFC: cross-libkvm/libthread_db/proc_service In-Reply-To: <200807211049.47579.jhb@freebsd.org> References: <34889018-8358-46AC-897E-32767FB84E14@mac.com> <200807211049.47579.jhb@freebsd.org> Message-ID: On Mon, 21 Jul 2008, John Baldwin wrote: > On Saturday 19 July 2008 01:59:29 pm Marcel Moolenaar wrote: >> All, >> >> We have a couple of interfaces/APIs that can't be used cross-platform. >> >> Take for example libkvm. On a 32-bit platform, we can't typically use >> libkvm on a 64-kernel, because the libkvm interface uses u_long for >> the target address, which on 32-bit platforms is 32 bits wide. >> >> Likewise, libthread_db and proc_service are designed for native use >> only and need API tweaks to work in a cross-environment. Both use >> psaddr_t to represent a target address, which is defined as a void* >> in . >> >> I'd like to change those interfaces/APIs to allow them to be used in >> a cross-platform debugging environment. Basically, this means that a >> target address will have to be defined as a uint64_t. Other datatypes >> may also need to be retyped. >> >> For libkvm in particular I don't want to redefine struct kinfo_proc, >> struct nlist, etc. While it could be useful in a hybrid 32/64-bit >> environment, the effect of such changes have too high a chance to >> trickle down various other components/interfaces. Thus, for libkvm >> the focus is on kvm_read() and kvm_write(). >> >> Suggested plan of attack: >> o add kvm_xread() and kvm_xwrite() to the libkvm API to minimize >> the overall impact. The new functions operate on a 64-bit target >> address (psaddr_t). >> o change psaddr_t from a void* to a 64-bit integral (sys/procfs.h) >> This affects proc_service and libthread_db. And consequently our >> threading support in GDB. >> >> Comments/thoughts? > > I think this is ok. However, can't you just make newer (1.1) versions of > kvm_read/write instead of adding a new API? You mean, "how about symbol versioning it"? -- DE From alfred at freebsd.org Mon Jul 21 22:00:50 2008 From: alfred at freebsd.org (Alfred Perlstein) Date: Mon Jul 21 22:00:57 2008 Subject: RFC: cross-libkvm/libthread_db/proc_service In-Reply-To: References: <34889018-8358-46AC-897E-32767FB84E14@mac.com> <200807211049.47579.jhb@freebsd.org> Message-ID: <20080721214104.GF76659@elvis.mu.org> * Daniel Eischen [080721 14:11] wrote: > On Mon, 21 Jul 2008, John Baldwin wrote: > > >On Saturday 19 July 2008 01:59:29 pm Marcel Moolenaar wrote: > >>All, > >> > >>We have a couple of interfaces/APIs that can't be used cross-platform. > >> > >>Take for example libkvm. On a 32-bit platform, we can't typically use > >>libkvm on a 64-kernel, because the libkvm interface uses u_long for > >>the target address, which on 32-bit platforms is 32 bits wide. > >> > >>Likewise, libthread_db and proc_service are designed for native use > >>only and need API tweaks to work in a cross-environment. Both use > >>psaddr_t to represent a target address, which is defined as a void* > >>in . > >> > >>I'd like to change those interfaces/APIs to allow them to be used in > >>a cross-platform debugging environment. Basically, this means that a > >>target address will have to be defined as a uint64_t. Other datatypes > >>may also need to be retyped. > >> > >>For libkvm in particular I don't want to redefine struct kinfo_proc, > >>struct nlist, etc. While it could be useful in a hybrid 32/64-bit > >>environment, the effect of such changes have too high a chance to > >>trickle down various other components/interfaces. Thus, for libkvm > >>the focus is on kvm_read() and kvm_write(). > >> > >>Suggested plan of attack: > >>o add kvm_xread() and kvm_xwrite() to the libkvm API to minimize > >> the overall impact. The new functions operate on a 64-bit target > >> address (psaddr_t). > >>o change psaddr_t from a void* to a 64-bit integral (sys/procfs.h) > >> This affects proc_service and libthread_db. And consequently our > >> threading support in GDB. > >> > >>Comments/thoughts? > > > >I think this is ok. However, can't you just make newer (1.1) versions of > >kvm_read/write instead of adding a new API? > > You mean, "how about symbol versioning it"? Isn't it a bit strange to export 64bit pointers to 32 bit userspace? Once this "hurdle" is breached, everything else will be horked as well, all pointers to internal structures will be hosed as well and need magic to read them. It seems like a real uphill battle to pound 64bit pegs into 32bit holes. I would like to see the "next steps" as in, what is expected to work after this is added and what will need additional shoe-horning to work. It seems to make a lot more sense to allow for 32bit compat sysctls than this. -- - Alfred Perlstein From xcllnt at mac.com Mon Jul 21 22:14:47 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Mon Jul 21 22:14:53 2008 Subject: RFC: cross-libkvm/libthread_db/proc_service In-Reply-To: <200807211049.47579.jhb@freebsd.org> References: <34889018-8358-46AC-897E-32767FB84E14@mac.com> <200807211049.47579.jhb@freebsd.org> Message-ID: <0AD2E758-D7E1-41B0-BA31-6FBDEE3470A9@mac.com> On Jul 21, 2008, at 7:49 AM, John Baldwin wrote: > On Saturday 19 July 2008 01:59:29 pm Marcel Moolenaar wrote: >> All, >> >> We have a couple of interfaces/APIs that can't be used cross- >> platform. >> >> Take for example libkvm. On a 32-bit platform, we can't typically use >> libkvm on a 64-kernel, because the libkvm interface uses u_long for >> the target address, which on 32-bit platforms is 32 bits wide. >> >> Likewise, libthread_db and proc_service are designed for native use >> only and need API tweaks to work in a cross-environment. Both use >> psaddr_t to represent a target address, which is defined as a void* >> in . >> >> I'd like to change those interfaces/APIs to allow them to be used in >> a cross-platform debugging environment. Basically, this means that a >> target address will have to be defined as a uint64_t. Other datatypes >> may also need to be retyped. >> >> For libkvm in particular I don't want to redefine struct kinfo_proc, >> struct nlist, etc. While it could be useful in a hybrid 32/64-bit >> environment, the effect of such changes have too high a chance to >> trickle down various other components/interfaces. Thus, for libkvm >> the focus is on kvm_read() and kvm_write(). >> >> Suggested plan of attack: >> o add kvm_xread() and kvm_xwrite() to the libkvm API to minimize >> the overall impact. The new functions operate on a 64-bit target >> address (psaddr_t). >> o change psaddr_t from a void* to a 64-bit integral (sys/procfs.h) >> This affects proc_service and libthread_db. And consequently our >> threading support in GDB. >> >> Comments/thoughts? > > I think this is ok. However, can't you just make newer (1.1) > versions of > kvm_read/write instead of adding a new API? The impact will be bigger, though. Any reference to kvm_read/kvm_write needs to be changed then. Think about stock GDB and various ports for example. Are we willing to differentiate from other OSes, provided we are willing to follow it through all the way? -- Marcel Moolenaar xcllnt@mac.com From brde at optusnet.com.au Wed Jul 23 05:12:11 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Wed Jul 23 05:12:17 2008 Subject: RFC: cross-libkvm/libthread_db/proc_service In-Reply-To: <20080721214104.GF76659@elvis.mu.org> References: <34889018-8358-46AC-897E-32767FB84E14@mac.com> <200807211049.47579.jhb@freebsd.org> <20080721214104.GF76659@elvis.mu.org> Message-ID: <20080723025519.F18257@delplex.bde.org> On Mon, 21 Jul 2008, Alfred Perlstein wrote: > Isn't it a bit strange to export 64bit pointers to 32 bit userspace? Only for pointers in kernel objects, and I think the proposed change doesn't touch that. kvm_read() doesn't use pointers for kernel addresses. It uses unsigned longs. But even uintmax_t is not enough in general, since the application uintmax_t might be too small to represent a kernel pointer. The type used shouldn't be fixed-width, but typedefed in an MD way like vm_offset_t. vm_offset_t gives the correct integral type to use for (mapped) kernel addresses and related compat_fewer_bit[s] type[s] are needed in userland. It would probably be too hard to support the general case which requires the compat types to be arrays or structs. Bruce From brde at optusnet.com.au Wed Jul 23 11:11:26 2008 From: brde at optusnet.com.au (Bruce Evans) Date: Wed Jul 23 11:11:33 2008 Subject: RFC: cross-libkvm/libthread_db/proc_service In-Reply-To: <20080723025519.F18257@delplex.bde.org> References: <34889018-8358-46AC-897E-32767FB84E14@mac.com> <200807211049.47579.jhb@freebsd.org> <20080721214104.GF76659@elvis.mu.org> <20080723025519.F18257@delplex.bde.org> Message-ID: <20080723032109.W18257@delplex.bde.org> On Wed, 23 Jul 2008, Bruce Evans wrote: > On Mon, 21 Jul 2008, Alfred Perlstein wrote: > >> Isn't it a bit strange to export 64bit pointers to 32 bit userspace? > > Only for pointers in kernel objects, and I think the proposed change > doesn't touch that. > > kvm_read() doesn't use pointers for kernel addresses. It uses unsigned > longs. But even uintmax_t is not enough in general, since the application > uintmax_t might be too small to represent a kernel pointer. The type > used shouldn't be fixed-width, but typedefed in an MD way like vm_offset_t. > vm_offset_t gives the correct integral type to use for (mapped) kernel > addresses and related compat_fewer_bit[s] type[s] are needed in userland. > It would probably be too hard to support the general case which requires > the compat types to be arrays or structs. Bah, I forgot the original mail which already says to use an integral type named psaddr_t, and that, unfortunately, this seems to need being 64 bits even on pure 32-bit systems in case you want to run an (not quite pure) 32-bit application in compat32 mode on 64-bit system without recompiling. If psaddr_t is 32-bits on i386 but 64-bits on amd64, then pure 32-bit i386 applications won't run in compat32 mode on amd64, though (not quite pure) 32-bit applications compiled on amd64 will. I don't like putting 64-bit knowledge in 32-bit applications but I often compile on i386 and run on amd64. Bruce From xcllnt at mac.com Wed Jul 23 17:14:32 2008 From: xcllnt at mac.com (Marcel Moolenaar) Date: Wed Jul 23 17:14:39 2008 Subject: RFC: cross-libkvm/libthread_db/proc_service In-Reply-To: <20080723032109.W18257@delplex.bde.org> References: <34889018-8358-46AC-897E-32767FB84E14@mac.com> <200807211049.47579.jhb@freebsd.org> <20080721214104.GF76659@elvis.mu.org> <20080723025519.F18257@delplex.bde.org> <20080723032109.W18257@delplex.bde.org> Message-ID: <71C01B9B-1E42-4D65-A3D7-F1DA14123524@mac.com> On Jul 22, 2008, at 10:27 AM, Bruce Evans wrote: > On Wed, 23 Jul 2008, Bruce Evans wrote: > >> On Mon, 21 Jul 2008, Alfred Perlstein wrote: >> >>> Isn't it a bit strange to export 64bit pointers to 32 bit userspace? >> >> Only for pointers in kernel objects, and I think the proposed change >> doesn't touch that. >> >> kvm_read() doesn't use pointers for kernel addresses. It uses >> unsigned >> longs. But even uintmax_t is not enough in general, since the >> application >> uintmax_t might be too small to represent a kernel pointer. The type >> used shouldn't be fixed-width, but typedefed in an MD way like >> vm_offset_t. >> vm_offset_t gives the correct integral type to use for (mapped) >> kernel >> addresses and related compat_fewer_bit[s] type[s] are needed in >> userland. >> It would probably be too hard to support the general case which >> requires >> the compat types to be arrays or structs. > > Bah, I forgot the original mail which already says to use an integral > type named psaddr_t, and that, unfortunately, this seems to need being > 64 bits even on pure 32-bit systems in case you want to run an (not > quite pure) 32-bit application in compat32 mode on 64-bit system > without > recompiling. Actually, the intend is more generic (or more limited, depending on how you look at it): the ability to cross-debug any (say ia64) kernel on any other (say powerpc) machine. The integral needs to be constant and equal in width across all platforms. This means that an uint<#>_t is the best option. The largest we support is 64-bit. We can already build a cross gdb from the source tree, but without threading support (libthread_db and proc_service limitation). We can't build a cross kgdb from the source tree because of libkvm. Both I'd like to be able to do. FYI, -- Marcel Moolenaar xcllnt@mac.com From jroberson at jroberson.net Fri Jul 25 02:30:21 2008 From: jroberson at jroberson.net (Jeff Roberson) Date: Fri Jul 25 02:30:28 2008 Subject: witness performance improvements In-Reply-To: <200807211141.09387.jhb@freebsd.org> References: <20080718163231.B954@desktop> <200807211141.09387.jhb@freebsd.org> Message-ID: <20080724162733.B954@desktop> On Mon, 21 Jul 2008, John Baldwin wrote: > On Friday 18 July 2008 10:41:58 pm Jeff Roberson wrote: >> Hello, >> >> I have a patch that improves witness performance available at: >> >> http://people.freebsd.org/~jeff/witness.diff >> >> This improvement comes at the cost of some significant space overhead. It >> changes the witness graph from a linked tree to a matrix based approach. >> Relationships can be quickly resolved with a table lookup. The table size >> is WITNESS_COUNT^2, or 1MB with the current count of 1024. > > Woo! Thanks for polishing this. > >> This patch also makes struct witness objects persistent even after the >> last lock using this name has been removed. This is helpful for short >> lived objects which may be created frequently. > > Originally, the idea was that if one had a LOR bug in a driver, one could > kldunload the driver and have WITNESS forget about any orders for the > driver's lock, fix the bug, and try again, but the short-lived names problem > is much more common in practice, and trying to remove info about a specific > lock class from the graph is a bit tenuous, so I think this is the better > approach going forward. > >> To reduce lock contention on SMP witness_checkorder() now runs without the >> w_mtx when there are no lock violations. I also cache a lock_list_entry >> in each thread as allocating these requires the w_mtx. The entry is >> disposed of at thread_exit(). > > Neat. > >> I'm mostly interested in hearing what people have to say about the space >> bloat. I believe it is in a commit ready state. > > I think the space usage is perfectly fine. Also, now that you malloc the > actual witness objects instead of putting them in the BSS (something that > should have been done anyway I think), I would make the number of witness > objects a loader tunable. Well, I'm glad there is a consensus that this is the right way forward. The state of the code is that there may be a bug in the dot output but I've not had any problems with the regular witness operation. There are still some style bugs in it. Attilio has expressed some interest in a full review and style clean-up. I'd like to get what I have now, minus the dot output, into svn. And then do a set of follow on commits to add back dot or Attilio's comma separated graph output that can be parsed to dot. Any objections to commiting this knowing it has some style bugs and a little work left? I'd like to get people testing the core functionality more. We've sat on this patch for a couple of years now as well. Thanks, Jeff > > -- > John Baldwin > From jhb at freebsd.org Fri Jul 25 16:27:36 2008 From: jhb at freebsd.org (John Baldwin) Date: Fri Jul 25 16:28:28 2008 Subject: witness performance improvements In-Reply-To: <20080724162733.B954@desktop> References: <20080718163231.B954@desktop> <200807211141.09387.jhb@freebsd.org> <20080724162733.B954@desktop> Message-ID: <200807251146.19058.jhb@freebsd.org> On Thursday 24 July 2008 10:30:36 pm Jeff Roberson wrote: > On Mon, 21 Jul 2008, John Baldwin wrote: > > > On Friday 18 July 2008 10:41:58 pm Jeff Roberson wrote: > >> Hello, > >> > >> I have a patch that improves witness performance available at: > >> > >> http://people.freebsd.org/~jeff/witness.diff > >> > >> This improvement comes at the cost of some significant space overhead. It > >> changes the witness graph from a linked tree to a matrix based approach. > >> Relationships can be quickly resolved with a table lookup. The table size > >> is WITNESS_COUNT^2, or 1MB with the current count of 1024. > > > > Woo! Thanks for polishing this. > > > >> This patch also makes struct witness objects persistent even after the > >> last lock using this name has been removed. This is helpful for short > >> lived objects which may be created frequently. > > > > Originally, the idea was that if one had a LOR bug in a driver, one could > > kldunload the driver and have WITNESS forget about any orders for the > > driver's lock, fix the bug, and try again, but the short-lived names problem > > is much more common in practice, and trying to remove info about a specific > > lock class from the graph is a bit tenuous, so I think this is the better > > approach going forward. > > > >> To reduce lock contention on SMP witness_checkorder() now runs without the > >> w_mtx when there are no lock violations. I also cache a lock_list_entry > >> in each thread as allocating these requires the w_mtx. The entry is > >> disposed of at thread_exit(). > > > > Neat. > > > >> I'm mostly interested in hearing what people have to say about the space > >> bloat. I believe it is in a commit ready state. > > > > I think the space usage is perfectly fine. Also, now that you malloc the > > actual witness objects instead of putting them in the BSS (something that > > should have been done anyway I think), I would make the number of witness > > objects a loader tunable. > > Well, I'm glad there is a consensus that this is the right way forward. > The state of the code is that there may be a bug in the dot output but > I've not had any problems with the regular witness operation. > > There are still some style bugs in it. Attilio has expressed some > interest in a full review and style clean-up. I'd like to get what I have > now, minus the dot output, into svn. And then do a set of follow on > commits to add back dot or Attilio's comma separated graph output that can > be parsed to dot. > > Any objections to commiting this knowing it has some style bugs and a > little work left? I'd like to get people testing the core functionality > more. We've sat on this patch for a couple of years now as well. I think you can commit and work on the other stuff afterwards. I had noticed a few style things too but don't want those to hold this up. -- John Baldwin From attilio at freebsd.org Fri Jul 25 22:17:41 2008 From: attilio at freebsd.org (Attilio Rao) Date: Fri Jul 25 22:17:48 2008 Subject: witness performance improvements In-Reply-To: <20080724162733.B954@desktop> References: <20080718163231.B954@desktop> <200807211141.09387.jhb@freebsd.org> <20080724162733.B954@desktop> Message-ID: <3bbf2fe10807251517v73447626j90458ebcd5345eaf@mail.gmail.com> 2008/7/25, Jeff Roberson : > On Mon, 21 Jul 2008, John Baldwin wrote: > > > On Friday 18 July 2008 10:41:58 pm Jeff Roberson wrote: > > > > > Hello, > > > > > > I have a patch that improves witness performance available at: > > > > > > http://people.freebsd.org/~jeff/witness.diff > > > > > > This improvement comes at the cost of some significant space overhead. > It > > > changes the witness graph from a linked tree to a matrix based approach. > > > Relationships can be quickly resolved with a table lookup. The table > size > > > is WITNESS_COUNT^2, or 1MB with the current count of 1024. > > > > > > > Woo! Thanks for polishing this. > > > > > > > This patch also makes struct witness objects persistent even after the > > > last lock using this name has been removed. This is helpful for short > > > lived objects which may be created frequently. > > > > > > > Originally, the idea was that if one had a LOR bug in a driver, one could > > kldunload the driver and have WITNESS forget about any orders for the > > driver's lock, fix the bug, and try again, but the short-lived names > problem > > is much more common in practice, and trying to remove info about a > specific > > lock class from the graph is a bit tenuous, so I think this is the better > > approach going forward. > > > > > > > To reduce lock contention on SMP witness_checkorder() now runs without > the > > > w_mtx when there are no lock violations. I also cache a lock_list_entry > > > in each thread as allocating these requires the w_mtx. The entry is > > > disposed of at thread_exit(). > > > > > > > Neat. > > > > > > > I'm mostly interested in hearing what people have to say about the space > > > bloat. I believe it is in a commit ready state. > > > > > > > I think the space usage is perfectly fine. Also, now that you malloc the > > actual witness objects instead of putting them in the BSS (something that > > should have been done anyway I think), I would make the number of witness > > objects a loader tunable. > > > > Well, I'm glad there is a consensus that this is the right way forward. The > state of the code is that there may be a bug in the dot output but I've not > had any problems with the regular witness operation. > > There are still some style bugs in it. Attilio has expressed some interest > in a full review and style clean-up. I'd like to get what I have now, minus > the dot output, into svn. And then do a set of follow on commits to add > back dot or Attilio's comma separated graph output that can be parsed to > dot. > > Any objections to commiting this knowing it has some style bugs and a little > work left? I'd like to get people testing the core functionality more. > We've sat on this patch for a couple of years now as well. Please go on and commit the code, delaying any further improvement. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From bugmaster at FreeBSD.org Mon Jul 28 11:06:53 2008 From: bugmaster at FreeBSD.org (FreeBSD bugmaster) Date: Mon Jul 28 11:07:14 2008 Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org Message-ID: <200807281106.m6SB6qks078850@freefall.freebsd.org> Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/120749 arch [request] Suggest upping the default kern.ps_arg_cache 1 problem total. From imp at bsdimp.com Tue Jul 29 22:14:39 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Tue Jul 29 22:14:46 2008 Subject: Need a code review Message-ID: <20080729.161303.709402272.imp@bsdimp.com> Greetings, The FreeBSD/mips efforts are getting close. We're down to 4 patches against the main tree, divided up among different programs: cc, binutils, libpam and the CDDL stuff for zfs. http://people.freebsd.org/~gonzo/mips2/binutils.diff http://people.freebsd.org/~gonzo/mips2/cc.diff http://people.freebsd.org/~gonzo/mips2/cddl.diff http://people.freebsd.org/~gonzo/mips2/libpam.diff If you have an interest in any of these area, or would like to provide feedback on the patches, now would be a good time to do so. :-) We'd like to commit these patches to the tree by the end of next week, if at all possible. If you are a maintainer of this software, we'd especially like to get feedback from you on these patches. If we don't hear back from you, we'll assume that you are fine with them :-) Warner From des at des.no Wed Jul 30 06:49:58 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Wed Jul 30 06:50:06 2008 Subject: Need a code review In-Reply-To: <20080729.161303.709402272.imp@bsdimp.com> (M. Warner Losh's message of "Tue\, 29 Jul 2008 16\:13\:03 -0600 \(MDT\)") References: <20080729.161303.709402272.imp@bsdimp.com> Message-ID: <86r69buar0.fsf@ds4.des.no> "M. Warner Losh" writes: > http://people.freebsd.org/~gonzo/mips2/libpam.diff This won't work. Your patch unconditionally sets NO_STATIC_MODULES which will result in a non-functional libpam.a (the modules will be built into the library, but without any of the glue that allows the library to find them) not just on mips, but on all other platforms. DES -- Dag-Erling Sm?rgrav - des@des.no From imp at bsdimp.com Wed Jul 30 18:43:21 2008 From: imp at bsdimp.com (M. Warner Losh) Date: Wed Jul 30 18:43:32 2008 Subject: Need a code review In-Reply-To: <86r69buar0.fsf@ds4.des.no> References: <20080729.161303.709402272.imp@bsdimp.com> <86r69buar0.fsf@ds4.des.no> Message-ID: <20080730.124142.1837098050.imp@bsdimp.com> In message: <86r69buar0.fsf@ds4.des.no> Dag-Erling_Sm?rgrav writes: : "M. Warner Losh" writes: : > http://people.freebsd.org/~gonzo/mips2/libpam.diff : : This won't work. Your patch unconditionally sets NO_STATIC_MODULES : which will result in a non-functional libpam.a (the modules will be : built into the library, but without any of the glue that allows the : library to find them) not just on mips, but on all other platforms. Thanks for the feedback. We'll try to fix it. Good catch. Warner From casparos at yahoo.de Thu Jul 31 02:24:38 2008 From: casparos at yahoo.de (Markus Mueller) Date: Thu Jul 31 02:24:44 2008 Subject: own OS-Name Message-ID: <4891225A.8010505@yahoo.de> I will create my own *BSD OS based on FreeBSD. How can I change the Name of this OS ? I mean, that in Logfiles, for example, of servers, which I connect by sufing in the web and in application which locate the OS instead "FREEBSD" an another OS-Name "MyOS-Name" will be displayed. Thanks for Helping and fast answers. ___________________________________________________________ Telefonate ohne weitere Kosten vom PC zum PC: http://messenger.yahoo.de From casparos at yahoo.de Thu Jul 31 02:34:45 2008 From: casparos at yahoo.de (Markus Mueller) Date: Thu Jul 31 02:34:51 2008 Subject: own OS-Name Message-ID: <489124C3.4060600@yahoo.de> I will create my own *BSD OS based on FreeBSD. How can I change the Name of this OS ? I mean, that in Logfiles, for example, of servers, which I connect by sufing in the web and in application which locate the OS instead "FREEBSD" an another OS-Name "MyOS-Name" will be displayed. Thanks for Helping and fast answers. ___________________________________________________________ Der fr?he Vogel f?ngt den Wurm. Hier gelangen Sie zum neuen Yahoo! Mail: http://mail.yahoo.de From des at des.no Thu Jul 31 09:49:52 2008 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Thu Jul 31 09:49:59 2008 Subject: own OS-Name In-Reply-To: <489124C3.4060600@yahoo.de> (Markus Mueller's message of "Thu\, 31 Jul 2008 04\:34\:43 +0200") References: <489124C3.4060600@yahoo.de> Message-ID: <86bq0es7r5.fsf@ds4.des.no> Markus Mueller writes: > I will create my own *BSD OS based on FreeBSD. > How can I change the Name of this OS ? > I mean, that in Logfiles, for example, of servers, which I connect by > sufing in the web and in application which locate the OS instead > "FREEBSD" an another OS-Name "MyOS-Name" will be displayed. You don't want to do that. It will cause you no end of pain with third-party software that relies on uname -s and / or compiler macros (__FreeBSD__) to turn specific features on or off. You will have to patch pretty much every autoconf script in existence. DES -- Dag-Erling Sm?rgrav - des@des.no