From peter at citylink.dinoex.sub.org Tue Sep 1 18:02:50 2009 From: peter at citylink.dinoex.sub.org (Peter Much) Date: Tue Sep 1 18:02:57 2009 Subject: crashdump "watchdog timeout" - Howto get useful information? Message-ID: Dear all, could anybody share some insight (or pointers to docs) on how to approach an analysis of a "watchdog timeout" crashdump? I hopefully have the necessities in place (that is, I can load the dump into ddd and actually see things). But I have no real idea about where to start looking for interesting things - some structure from where to unroll what the system was doing (or not doing). The "developers handbook" mainly explains about figuring the cause of the crash - but in my case this is obvious, it is the watchdog I have configured. Since this is a reproducible issue, ideas on things that could be configured beforehand could also be useful. rgds, PMc From jhb at freebsd.org Wed Sep 2 12:44:44 2009 From: jhb at freebsd.org (John Baldwin) Date: Wed Sep 2 12:44:51 2009 Subject: crashdump "watchdog timeout" - Howto get useful information? In-Reply-To: References: Message-ID: <200909020835.47358.jhb@freebsd.org> On Tuesday 01 September 2009 1:45:44 pm Peter Much wrote: > > Dear all, > > could anybody share some insight (or pointers to docs) on how to > approach an analysis of a "watchdog timeout" crashdump? > > I hopefully have the necessities in place (that is, I can load > the dump into ddd and actually see things). > > But I have no real idea about where to start looking for interesting > things - some structure from where to unroll what the system was > doing (or not doing). > The "developers handbook" mainly explains about figuring the cause > of the crash - but in my case this is obvious, it is the watchdog I > have configured. > > Since this is a reproducible issue, ideas on things that could be > configured beforehand could also be useful. I would examine the state of the processes in the system first. If all the CPUs are idle but some threads are blocked on locks you might have a deadlock, etc. You can use the gdb scripts at http://www.FreeBSD.org/~jhb/gdb/ in kgdb to figure some of that stuff out (source gdb6 from within gdb. I usually start with 'ps'). -- John Baldwin From Robert.Eckardt at Robert-Eckardt.de Wed Sep 2 20:33:04 2009 From: Robert.Eckardt at Robert-Eckardt.de (Robert Eckardt) Date: Wed Sep 2 20:35:18 2009 Subject: ZFS continuously growing Message-ID: <20090902193059.M336@Robert-Eckardt.de> Hi folks, after upgrading my backup server to 8.0-BETA2, I noticed that the available space shrinks from backup to backup (a tree each day with differential rsync) although with each new tree the oldest tree gets removed. Since I removed some subdirectories on my active server the number of used inodes now is reduced by approx. 90000 on each run. At the same time used space grows by between 650MB and 6.7GB and free space gets reduced by 4.4 to 9GB (see table below). The output of "df" and "zfs list" is consistent. Although I understand that the backed-up file by rsync can be much larger than the data transferred I get worried that without changing much the available space shrinks continuously. (Remember, the number of backup trees stays constant since the oldest gets removed and 6GB/d results in more that 1TB over half a year.) Do I have to be worried? Is there a memory leak in the current ZFS implementation? Why is used space growing slower than free space is shrinking? Is there some garbage collection needed in ZFS? Besides, although the backup server has 3 GB RAM I had to tune arc_max to 150MB to copy the backed-up data from an 2.8TB ZFS (v6) to the 4.5 TB ZFS (v13) by "zfs send|zfs recv" without kmalloc panic. (I.e., the defaults algorithm was not sufficient.) Regards, Robert day rsynced Used free inodes oldest dir newest dir d-used d-free d-inode 27 57018987 2792986368 1914681984 43854571 20090224-0917 20090827-0916 28 67181251 2794269440 1910242176 43765134 20090225-0917 20090828-0916 1.283.072 -4.439.808 -89.437 30 52078382 2800983296 1897022720 43586320 20090227-0917 20090830-0916 6.713.856 -13.219.456 -178.814 31 2647268060 2803757056 1891064192 43496712 20090228-0917 20090831-0916 2.773.760 -5.958.528 -89.608 1 92096258 2804415616 1881965184 43406059 20090301-0917 20090901-0916 658.560 -9.099.008 -90.653 2 121590303 2807900288 1875341440 43316517 20090302-0917 20090902-0916 3.484.672 -6.623.744 -89.542 -- Dr. Robert Eckardt --- Robert.Eckardt@Robert-Eckardt.de From rol at robert-eckardt.de Wed Sep 2 21:01:47 2009 From: rol at robert-eckardt.de (Robert Eckardt) Date: Wed Sep 2 21:01:53 2009 Subject: ZFS continuously growing In-Reply-To: <20090902193059.M336@Robert-Eckardt.de> References: <20090902193059.M336@Robert-Eckardt.de> Message-ID: <20090902204845.M67171@Robert-Eckardt.de> Hi folks, after upgrading my backup server to 8.0-BETA2, I noticed that the available space shrinks from backup to backup (a tree each day with differential rsync) although with each new tree the oldest tree gets removed. Since I removed some subdirectories on my active server the number of used inodes now is reduced by approx. 90000 on each run. At the same time used space grows by between 650MB and 6.7GB and free space gets reduced by 4.4 to 9GB (see table below). The output of "df" and "zfs list" is consistent. Although I understand that the backed-up file by rsync can be much larger than the data transferred I get worried that without changing much the available space shrinks continuously. (Remember, the number of backup trees stays constant since the oldest gets removed and 6GB/d results in more that 1TB over half a year.) Do I have to be worried? Is there a memory leak in the current ZFS implementation? Why is used space growing slower than free space is shrinking? Is there some garbage collection needed in ZFS? Besides, although the backup server has 3 GB RAM I had to tune arc_max to 150MB to copy the backed-up data from an 2.8TB ZFS (v6) to the 4.5 TB ZFS (v13) by "zfs send|zfs recv" without kmalloc panic. (I.e., the defaults algorithm was not sufficient.) Regards, Robert day rsynced Used free inodes oldest dir newest dir d-used d-free d-inode 27 57018987 2792986368 1914681984 43854571 20090224-0917 20090827-0916 28 67181251 2794269440 1910242176 43765134 20090225-0917 20090828-0916 1.283.072 -4.439.808 -89.437 30 52078382 2800983296 1897022720 43586320 20090227-0917 20090830-0916 6.713.856 -13.219.456 -178.814 31 2647268060 2803757056 1891064192 43496712 20090228-0917 20090831-0916 2.773.760 -5.958.528 -89.608 1 92096258 2804415616 1881965184 43406059 20090301-0917 20090901-0916 658.560 -9.099.008 -90.653 2 121590303 2807900288 1875341440 43316517 20090302-0917 20090902-0916 3.484.672 -6.623.744 -89.542 -- Dr. Robert Eckardt --- Robert.Eckardt@Robert-Eckardt.de From peter at citylink.dinoex.sub.org Wed Sep 2 21:13:10 2009 From: peter at citylink.dinoex.sub.org (Peter Much) Date: Wed Sep 2 21:13:16 2009 Subject: crashdump "watchdog timeout" - Howto get useful information? References: <200909020835.47358.jhb@freebsd.org> Message-ID: In article <200909020835.47358.jhb@freebsd.org>, John Baldwin wrote: |You can use the gdb scripts at http://www.FreeBSD.org/~jhb/gdb/ in kgdb to |figure some of that stuff out (source gdb6 from within gdb. I usually start |with 'ps'). WHOOAAAH, THAT ROCKS!!!! Thats a lot more than I had hoped for! muchas, muchas gracias!!! PMc From ady at freebsd.ady.ro Thu Sep 3 07:42:22 2009 From: ady at freebsd.ady.ro (Adrian Penisoara) Date: Thu Sep 3 07:42:29 2009 Subject: ZFS continuously growing In-Reply-To: <20090902193059.M336@Robert-Eckardt.de> References: <20090902193059.M336@Robert-Eckardt.de> Message-ID: <78cb3d3f0909030009y505a30by769052258576bfeb@mail.gmail.com> Hi, On Wed, Sep 2, 2009 at 10:22 PM, Robert Eckardt wrote: > > Hi folks, > > after upgrading my backup server to 8.0-BETA2, I noticed that the > available space shrinks from backup to backup (a tree each day with > differential rsync) although with each new tree the oldest tree gets > removed. > > Since I removed some subdirectories on my active server the number > of used inodes now is reduced by approx. 90000 on each run. > At the same time used space grows by between 650MB and 6.7GB and > free space gets reduced by 4.4 to 9GB (see table below). The output > of "df" and "zfs list" is consistent. > > Although I understand that the backed-up file by rsync can be much > larger than the data transferred I get worried that without changing > much the available space shrinks continuously. (Remember, the number > of backup trees stays constant since the oldest gets removed and > 6GB/d results in more that 1TB over half a year.) > > Do I have to be worried? > Is there a memory leak in the current ZFS implementation? > Why is used space growing slower than free space is shrinking? > Is there some garbage collection needed in ZFS? > > Besides, although the backup server has 3 GB RAM I had to tune arc_max > to 150MB to copy the backed-up data from an 2.8TB ZFS (v6) to the > 4.5 TB ZFS (v13) by "zfs send|zfs recv" without kmalloc panic. > (I.e., the defaults algorithm was not sufficient.) Do I take you are using ZFS snapshots in between rsync'ing (send/recv requires snapshots) ? Could you please post the "zfs list" output after subsequent runs to clarify ? Regards, Adrian EnterpriseBSD From rol at robert-eckardt.de Thu Sep 3 08:07:02 2009 From: rol at robert-eckardt.de (Robert Eckardt) Date: Thu Sep 3 08:07:08 2009 Subject: Fw: Re: ZFS continuously growing In-Reply-To: <20090903071913.M84990@Robert-Eckardt.de> References: <20090902193059.M336@Robert-Eckardt.de> <78cb3d3f0909030009y505a30by769052258576bfeb@mail.gmail.com> <20090903071913.M84990@Robert-Eckardt.de> Message-ID: <20090903080558.M13772@Robert-Eckardt.de> On Thu, 3 Sep 2009 09:09:19 +0200, Adrian Penisoara wrote > Hi, > > On Wed, Sep 2, 2009 at 10:22 PM, Robert Eckardt > wrote: > > Do I have to be worried? > > Is there a memory leak in the current ZFS implementation? > > Why is used space growing slower than free space is shrinking? > > Is there some garbage collection needed in ZFS? > > > > Besides, although the backup server has 3 GB RAM I had to tune arc_max > > to 150MB to copy the backed-up data from an 2.8TB ZFS (v6) to the > > 4.5 TB ZFS (v13) by "zfs send|zfs recv" without kmalloc panic. > > (I.e., the defaults algorithm was not sufficient.) > > Do I take you are using ZFS snapshots in between rsync'ing > (send/recv requires snapshots) ? Could you please post the "zfs > list" output after subsequent runs to clarify ? > > Regards, > Adrian > EnterpriseBSD Hi Adrian, no I'm not using snapshots. Just seperate directories, where identical files are hardlinked by rsync to the version one day older. The send|recv was neccessary when I increased the raidz of the backup-fs. (Copying everthing to two 1.5TB HDDs and after adding disks back again. I used s.th. like "zfs send bigpool/big@backup | zfs recv big/big".) Here the zfs list of the last five days: Thu Sep 3 09:36:12 CEST 2009 (Today add. 2GB of data were transfered.) big 1861882752 0 1861882752 0% 5 14545959 0% /big big/big 4676727168 2814844416 1861882752 60% 43137409 14545959 75% /big/big NAME USED AVAIL REFER MOUNTPOINT big 2.72T 1.73T 31.5K /big big/big 2.72T 1.73T 2.62T /big/big Wed Sep 2 09:36:24 CEST 2009 big 1869058944 128 1869058816 0% 5 14602022 0% /big big/big 4679698688 2810639872 1869058816 60% 43226966 14602022 75% /big/big NAME USED AVAIL REFER MOUNTPOINT big 2.72T 1.74T 31.5K /big big/big 2.72T 1.74T 2.62T /big/big Tue Sep 1 09:36:33 CEST 2009 big 1875352064 0 1875352064 0% 5 14651188 0% /big big/big 4683241856 2807889792 1875352064 60% 43316454 14651188 75% /big/big NAME USED AVAIL REFER MOUNTPOINT big 2.71T 1.75T 31.5K /big big/big 2.71T 1.75T 2.62T /big/big Mon Aug 31 09:45:26 CEST 2009 big 1881967616 128 1881967488 0% 5 14702871 0% /big big/big 4686380928 2804413440 1881967488 60% 43406044 14702871 75% /big/big NAME USED AVAIL REFER MOUNTPOINT big 2.71T 1.75T 31.5K /big big/big 2.71T 1.75T 2.61T /big/big Sun Aug 30 09:39:31 CEST 2009 big 1891064192 0 1891064192 0% 5 14773939 0% /big big/big 4694821376 2803757184 1891064192 60% 43496712 14773939 75% /big/big NAME USED AVAIL REFER MOUNTPOINT big 2.70T 1.76T 31.5K /big big/big 2.70T 1.76T 2.61T /big/big Regards, Robert -- Dr. Robert Eckardt --- Robert.Eckardt@Robert-Eckardt.de From rol at robert-eckardt.de Thu Sep 3 09:48:52 2009 From: rol at robert-eckardt.de (Robert Eckardt) Date: Thu Sep 3 09:48:59 2009 Subject: Fw: Re: ZFS continuously growing [SOLVED] In-Reply-To: References: <20090902193059.M336@Robert-Eckardt.de> <78cb3d3f0909030009y505a30by769052258576bfeb@mail.gmail.com> <20090903071913.M84990@Robert-Eckardt.de> <20090903080558.M13772@Robert-Eckardt.de> Message-ID: <20090903094501.M34547@robert-eckardt.de> On Thu, 3 Sep 2009 10:01:28 +0100, krad wrote > 2009/9/3 Robert Eckardt > On Thu, 3 Sep 2009 09:09:19 +0200, Adrian Penisoara wrote > > > Hi, > > > > On Wed, Sep 2, 2009 at 10:22 PM, Robert Eckardt > > wrote: > > > > Do I have to be worried? > > > Is there a memory leak in the current ZFS implementation? > > > Why is used space growing slower than free space is shrinking? > > > Is there some garbage collection needed in ZFS? > > > > > > Besides, although the backup server has 3 GB RAM I had to tune arc_max > > > to 150MB to copy the backed-up data from an 2.8TB ZFS (v6) to the > > > 4.5 TB ZFS (v13) by "zfs send|zfs recv" without kmalloc panic. > > > (I.e., the defaults algorithm was not sufficient.) > > > do a " zfs list -t all" > > you will see all snapshots and zvols then as well Uups, sorry for asking. Everything o.k. after "zfs destroy big/big@backup" :-( I hope the info on arc_max will stay useful. Regards, Robert -- Dr. Robert Eckardt ? ?--- ? ? Robert.Eckardt@Robert-Eckardt.de From ady at freebsd.ady.ro Thu Sep 3 10:41:56 2009 From: ady at freebsd.ady.ro (Adrian Penisoara) Date: Thu Sep 3 10:42:03 2009 Subject: Fw: Re: ZFS continuously growing [SOLVED] In-Reply-To: References: <20090902193059.M336@Robert-Eckardt.de> <78cb3d3f0909030009y505a30by769052258576bfeb@mail.gmail.com> <20090903071913.M84990@Robert-Eckardt.de> <20090903080558.M13772@Robert-Eckardt.de> <20090903094501.M34547@robert-eckardt.de> Message-ID: <78cb3d3f0909030341m27bdab76o7564dddecca3c85@mail.gmail.com> Hi, On Thu, Sep 3, 2009 at 12:05 PM, krad wrote: [...] > > There was a change between zfs v7 and v13. IN 7 when you did a zfs list it > would show snapshots, after 13 it didnt unless you supplied the switch. It > still catches me out as we have a right mix of zfs version at work, so dont > feel to bad 8) Nasty surprise, doesn't that break POLA ? :) I assume that was Sun's change, not something specific to FreeBSD... Regards, Adrian EnterpriseBSD. From kraduk at googlemail.com Thu Sep 3 09:33:38 2009 From: kraduk at googlemail.com (krad) Date: Thu Sep 3 11:24:43 2009 Subject: Fw: Re: ZFS continuously growing In-Reply-To: <20090903080558.M13772@Robert-Eckardt.de> References: <20090902193059.M336@Robert-Eckardt.de> <78cb3d3f0909030009y505a30by769052258576bfeb@mail.gmail.com> <20090903071913.M84990@Robert-Eckardt.de> <20090903080558.M13772@Robert-Eckardt.de> Message-ID: 2009/9/3 Robert Eckardt > On Thu, 3 Sep 2009 09:09:19 +0200, Adrian Penisoara wrote > > Hi, > > > > On Wed, Sep 2, 2009 at 10:22 PM, Robert Eckardt > > wrote: > > > Do I have to be worried? > > > Is there a memory leak in the current ZFS implementation? > > > Why is used space growing slower than free space is shrinking? > > > Is there some garbage collection needed in ZFS? > > > > > > Besides, although the backup server has 3 GB RAM I had to tune arc_max > > > to 150MB to copy the backed-up data from an 2.8TB ZFS (v6) to the > > > 4.5 TB ZFS (v13) by "zfs send|zfs recv" without kmalloc panic. > > > (I.e., the defaults algorithm was not sufficient.) > > > > Do I take you are using ZFS snapshots in between rsync'ing > > (send/recv requires snapshots) ? Could you please post the "zfs > > list" output after subsequent runs to clarify ? > > > > Regards, > > Adrian > > EnterpriseBSD > > Hi Adrian, > > no I'm not using snapshots. Just seperate directories, where identical > files are hardlinked by rsync to the version one day older. > The send|recv was neccessary when I increased the raidz of the backup-fs. > (Copying everthing to two 1.5TB HDDs and after adding disks back again. > I used s.th. like "zfs send bigpool/big@backup | zfs recv big/big".) > > Here the zfs list of the last five days: > Thu Sep 3 09:36:12 CEST 2009 (Today add. 2GB of data were transfered.) > big 1861882752 0 1861882752 0% 5 14545959 > 0% /big > big/big 4676727168 2814844416 1861882752 60% 43137409 14545959 > 75% /big/big > NAME USED AVAIL REFER MOUNTPOINT > big 2.72T 1.73T 31.5K /big > big/big 2.72T 1.73T 2.62T /big/big > > Wed Sep 2 09:36:24 CEST 2009 > big 1869058944 128 1869058816 0% 5 14602022 > 0% /big > big/big 4679698688 2810639872 1869058816 60% 43226966 14602022 > 75% /big/big > NAME USED AVAIL REFER MOUNTPOINT > big 2.72T 1.74T 31.5K /big > big/big 2.72T 1.74T 2.62T /big/big > > Tue Sep 1 09:36:33 CEST 2009 > big 1875352064 0 1875352064 0% 5 14651188 > 0% /big > big/big 4683241856 2807889792 1875352064 60% 43316454 14651188 > 75% /big/big > NAME USED AVAIL REFER MOUNTPOINT > big 2.71T 1.75T 31.5K /big > big/big 2.71T 1.75T 2.62T /big/big > > Mon Aug 31 09:45:26 CEST 2009 > big 1881967616 128 1881967488 0% 5 14702871 > 0% /big > big/big 4686380928 2804413440 1881967488 60% 43406044 14702871 > 75% /big/big > NAME USED AVAIL REFER MOUNTPOINT > big 2.71T 1.75T 31.5K /big > big/big 2.71T 1.75T 2.61T /big/big > > Sun Aug 30 09:39:31 CEST 2009 > big 1891064192 0 1891064192 0% 5 14773939 > 0% /big > big/big 4694821376 2803757184 1891064192 60% 43496712 14773939 > 75% /big/big > NAME USED AVAIL REFER MOUNTPOINT > big 2.70T 1.76T 31.5K /big > big/big 2.70T 1.76T 2.61T /big/big > > Regards, > Robert > > -- > Dr. Robert Eckardt --- Robert.Eckardt@Robert-Eckardt.de > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > do a " zfs list -t all" you will see all snapshots and zvols then as well From kraduk at googlemail.com Thu Sep 3 09:34:54 2009 From: kraduk at googlemail.com (krad) Date: Thu Sep 3 11:25:06 2009 Subject: Fw: Re: ZFS continuously growing In-Reply-To: References: <20090902193059.M336@Robert-Eckardt.de> <78cb3d3f0909030009y505a30by769052258576bfeb@mail.gmail.com> <20090903071913.M84990@Robert-Eckardt.de> <20090903080558.M13772@Robert-Eckardt.de> Message-ID: 2009/9/3 krad > > > 2009/9/3 Robert Eckardt > > On Thu, 3 Sep 2009 09:09:19 +0200, Adrian Penisoara wrote >> > Hi, >> > >> > On Wed, Sep 2, 2009 at 10:22 PM, Robert Eckardt >> > wrote: >> > > Do I have to be worried? >> > > Is there a memory leak in the current ZFS implementation? >> > > Why is used space growing slower than free space is shrinking? >> > > Is there some garbage collection needed in ZFS? >> > > >> > > Besides, although the backup server has 3 GB RAM I had to tune arc_max >> > > to 150MB to copy the backed-up data from an 2.8TB ZFS (v6) to the >> > > 4.5 TB ZFS (v13) by "zfs send|zfs recv" without kmalloc panic. >> > > (I.e., the defaults algorithm was not sufficient.) >> > >> > Do I take you are using ZFS snapshots in between rsync'ing >> > (send/recv requires snapshots) ? Could you please post the "zfs >> > list" output after subsequent runs to clarify ? >> > >> > Regards, >> > Adrian >> > EnterpriseBSD >> >> Hi Adrian, >> >> no I'm not using snapshots. Just seperate directories, where identical >> files are hardlinked by rsync to the version one day older. >> The send|recv was neccessary when I increased the raidz of the backup-fs. >> (Copying everthing to two 1.5TB HDDs and after adding disks back again. >> I used s.th. like "zfs send bigpool/big@backup | zfs recv big/big".) >> >> Here the zfs list of the last five days: >> Thu Sep 3 09:36:12 CEST 2009 (Today add. 2GB of data were transfered.) >> big 1861882752 0 1861882752 0% 5 14545959 >> 0% /big >> big/big 4676727168 2814844416 1861882752 60% 43137409 14545959 >> 75% /big/big >> NAME USED AVAIL REFER MOUNTPOINT >> big 2.72T 1.73T 31.5K /big >> big/big 2.72T 1.73T 2.62T /big/big >> >> Wed Sep 2 09:36:24 CEST 2009 >> big 1869058944 128 1869058816 0% 5 14602022 >> 0% /big >> big/big 4679698688 2810639872 1869058816 60% 43226966 14602022 >> 75% /big/big >> NAME USED AVAIL REFER MOUNTPOINT >> big 2.72T 1.74T 31.5K /big >> big/big 2.72T 1.74T 2.62T /big/big >> >> Tue Sep 1 09:36:33 CEST 2009 >> big 1875352064 0 1875352064 0% 5 14651188 >> 0% /big >> big/big 4683241856 2807889792 1875352064 60% 43316454 14651188 >> 75% /big/big >> NAME USED AVAIL REFER MOUNTPOINT >> big 2.71T 1.75T 31.5K /big >> big/big 2.71T 1.75T 2.62T /big/big >> >> Mon Aug 31 09:45:26 CEST 2009 >> big 1881967616 128 1881967488 0% 5 14702871 >> 0% /big >> big/big 4686380928 2804413440 1881967488 60% 43406044 14702871 >> 75% /big/big >> NAME USED AVAIL REFER MOUNTPOINT >> big 2.71T 1.75T 31.5K /big >> big/big 2.71T 1.75T 2.61T /big/big >> >> Sun Aug 30 09:39:31 CEST 2009 >> big 1891064192 0 1891064192 0% 5 14773939 >> 0% /big >> big/big 4694821376 2803757184 1891064192 60% 43496712 14773939 >> 75% /big/big >> NAME USED AVAIL REFER MOUNTPOINT >> big 2.70T 1.76T 31.5K /big >> big/big 2.70T 1.76T 2.61T /big/big >> >> Regards, >> Robert >> >> -- >> Dr. Robert Eckardt --- Robert.Eckardt@Robert-Eckardt.de >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org >> " >> > > do a " zfs list -t all" > > you will see all snapshots and zvols then as well > > also might be worth doing a "zfs get all| grep copies" just to make sure you haven't got multiple copies assigned to any fs It sounds like for some reason the free blocks are getting put back into the free block table/map Also try scrubbing the zpool for good measure. Its worth croning it once a week From kraduk at googlemail.com Thu Sep 3 10:05:43 2009 From: kraduk at googlemail.com (krad) Date: Thu Sep 3 11:25:18 2009 Subject: Fw: Re: ZFS continuously growing [SOLVED] In-Reply-To: <20090903094501.M34547@robert-eckardt.de> References: <20090902193059.M336@Robert-Eckardt.de> <78cb3d3f0909030009y505a30by769052258576bfeb@mail.gmail.com> <20090903071913.M84990@Robert-Eckardt.de> <20090903080558.M13772@Robert-Eckardt.de> <20090903094501.M34547@robert-eckardt.de> Message-ID: 2009/9/3 Robert Eckardt > On Thu, 3 Sep 2009 10:01:28 +0100, krad wrote > > 2009/9/3 Robert Eckardt > > On Thu, 3 Sep 2009 09:09:19 +0200, Adrian Penisoara wrote > > > > > Hi, > > > > > > On Wed, Sep 2, 2009 at 10:22 PM, Robert Eckardt > > > wrote: > > > > > > Do I have to be worried? > > > > Is there a memory leak in the current ZFS implementation? > > > > Why is used space growing slower than free space is shrinking? > > > > Is there some garbage collection needed in ZFS? > > > > > > > > Besides, although the backup server has 3 GB RAM I had to tune > arc_max > > > > to 150MB to copy the backed-up data from an 2.8TB ZFS (v6) to the > > > > 4.5 TB ZFS (v13) by "zfs send|zfs recv" without kmalloc panic. > > > > (I.e., the defaults algorithm was not sufficient.) > > > > > > do a " zfs list -t all" > > > > you will see all snapshots and zvols then as well > > Uups, sorry for asking. > Everything o.k. after "zfs destroy big/big@backup" :-( > > I hope the info on arc_max will stay useful. > > Regards, > Robert > > -- > Dr. Robert Eckardt --- Robert.Eckardt@Robert-Eckardt.de > > There was a change between zfs v7 and v13. IN 7 when you did a zfs list it would show snapshots, after 13 it didnt unless you supplied the switch. It still catches me out as we have a right mix of zfs version at work, so dont feel to bad 8) From bu7cher at yandex.ru Thu Sep 3 11:52:33 2009 From: bu7cher at yandex.ru (Andrey V. Elsukov) Date: Thu Sep 3 11:52:39 2009 Subject: Fw: Re: ZFS continuously growing [SOLVED] In-Reply-To: References: <20090902193059.M336@Robert-Eckardt.de> <78cb3d3f0909030009y505a30by769052258576bfeb@mail.gmail.com> <20090903071913.M84990@Robert-Eckardt.de> <20090903080558.M13772@Robert-Eckardt.de> <20090903094501.M34547@robert-eckardt.de> Message-ID: <4A9FAB5A.6020302@yandex.ru> krad wrote: > There was a change between zfs v7 and v13. IN 7 when you did a zfs list it > would show snapshots, after 13 it didnt unless you supplied the switch. It > still catches me out as we have a right mix of zfs version at work, so dont > feel to bad 8) Try: # zpool set listsnapshots=on -- WBR, Andrey V. Elsukov From ticso at cicely7.cicely.de Thu Sep 3 14:07:30 2009 From: ticso at cicely7.cicely.de (Bernd Walter) Date: Thu Sep 3 14:07:39 2009 Subject: Fw: Re: ZFS continuously growing [SOLVED] In-Reply-To: <4A9FAB5A.6020302@yandex.ru> References: <20090902193059.M336@Robert-Eckardt.de> <78cb3d3f0909030009y505a30by769052258576bfeb@mail.gmail.com> <20090903071913.M84990@Robert-Eckardt.de> <20090903080558.M13772@Robert-Eckardt.de> <20090903094501.M34547@robert-eckardt.de> <4A9FAB5A.6020302@yandex.ru> Message-ID: <20090903133207.GC60240@cicely7.cicely.de> On Thu, Sep 03, 2009 at 03:41:14PM +0400, Andrey V. Elsukov wrote: > krad wrote: > >There was a change between zfs v7 and v13. IN 7 when you did a zfs list it > >would show snapshots, after 13 it didnt unless you supplied the switch. It > >still catches me out as we have a right mix of zfs version at work, so dont > >feel to bad 8) > > Try: > # zpool set listsnapshots=on Ah - I already did an zlist alias... This however is better for my routined fingers. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From faust64 at gmail.com Thu Sep 3 15:50:06 2009 From: faust64 at gmail.com (=?ISO-8859-1?Q?Samuel_Mart=EDn_Moro?=) Date: Thu Sep 3 15:50:13 2009 Subject: Making bootable USB keys In-Reply-To: References: Message-ID: hello. again. btw: my .img file is 0-filled in its 512 first bytes... i downloaded the 8-0-BETA3-???.img, it starts with EB3C. and I think each .img file start like that, right ? thanks Samuel Mart?n Moro CamTrace {EPITECH.} tek4 On Thu, Sep 3, 2009 at 5:35 PM, Samuel Mart?n Moro wrote: > > Hello > > I'm having some troubles, trying to create bootable USB keys. > I found (freebsd-hackers ML archives) a script, supposed to create the > bootable image from my iso file. > But, it still don't boot... (I may do it wrong) > > In details: > -We distribute a FreeBSD (4.7, 5.4, 6.2 and 7.2) "custom" server. > -We burn our install CD (and, in a few, our USB sticks) on a Ferdora 9 > (sorry...) > -USB sticks must contain a FAT32 partition (we'ld like to provide doc for > windows users) > > Well, my english isn't so great... so I'll post my code (more > understandable) > > --maker.sh-- > #!/bin/sh > ISO_DIR=/r00t > ISO_PFIX=r00t > VERSION=5.9.3.0 > ISO_FILE=$ISO_DIR/$ISO_PFIX-$VERSION.img > DEVICE= > TMPDOC=/mnt/tmpdoc > DOCDIR=/root/samuel/docdir > ERR= > SFX= > MBR=/root/samuel/mbr > BT1=/root/samuel/boot1 > BT2=/root/samuel/boot2 > > if [ -e "$1" ]; then > DEVICE=$1 > elif [ "$1" -a -e "/dev/$1" ]; then > DEVICE=/dev/$1 > elif [ "$1" ]; then > echo "$0: incorrect device specified" >&2 > exit > else > echo "$0: must specify device" >&2 > exit > fi > for i in `mount | cut -d ' ' -f 1` > do > if [ "`echo $i | grep $DEVICE`" ]; then > echo "$0: $i already mounted" >&2 > echo " umount it manually or choose an other drive" >&2 > exit > fi > done > > if [ -e "$TMPDOC" -a -d "$TMPDOC" ]; then > echo "$0: removing $TMPDOC directory" >&2 > rm -rf $TMPDOC > elif [ -e "$TMPDOC" ]; then > mv $TMPDOC $TMPDOC.old > echo "$0: moved $TMPDOC to $TMPDOC.old" >&2 > fi > mkdir $TMPDOC > > if [ "$2" ]; then > echo $2 | grep "\.img$" >/dev/null || SFX=".img" > fi > > if [ -e "$2$SFX" ]; then > ISO_FILE=$2$SFX > elif [ "$2" -a -e "$ISO_DIR/$2$SFX" ]; then > ISO_FILE=$ISO_DIR/$2 > elif [ "$2" -a -e "$ISO_DIR/$ISO_PFIX-$2$SFX" ]; then > ISO_FILE="$ISO_DIR/$ISO_PFX-$2$SFX" > else > echo "$0: will use default file \`$ISO_FILE'" >&2 > echo " as system image source" >&2 > fi > if [ -e "$ISO_FILE" ]; then > MSize=`ls -l $ISO_FILE | awk '{print $5}'` > else > echo "$0: $ISO_FILE doesn't exist!" >&2 > rm -rf $TMPDOC > exit > fi > if [ -z "$MSize" -o "$MSize" -lt 1 ]; then > echo "$0: bad image size (size=$MSize)" >&2 > rm -rf $TMPDOC > exit > fi > > while : > do > echo " [ Working on $DEVICE ]" > > echo -n " determining device geometry " > infos=`fdisk -l $DEVICE 2>/dev/null | grep "[0-9]* heads"` > ident=`fdisk -l $DEVICE 2>/dev/null | awk '/Disk identifier/{print > $3}'` > csz=`fdisk -l $DEVICE 2>/dev/null | awk '/Units = cylinders /{print > $7}'` > eval `echo $infos | awk '{print "hpc=" $1 " sec=" $3 " cyl=" $5}'` > if [ -z "$hpc" -o -z "$sec" -o -z "$cyl" -o -z "$csz" ]; then > echo " [ FAIL ]" > echo "$0: can't get infos for device $DEVICE" >&2 > rm -rf $TMPDOC > exit > fi > echo " [ OK ]" > > echo -n " initializing partition table " > # dd if=/dev/zero of=$DEVICE bs=$csz count=1 >/dev/null 2>&1 > dd if=$BT1 of=$DEVICE >/dev/null 2>&1 > round=128 > tocyl=`expr $hpc '*' $sec '*' $csz` > ret=`expr $MSize % $tocyl` > MSize=`expr $MSize / $tocyl` > test "$ret" -eq "0" || MSize=`expr $MSize + 1` > s2len=$MSize > s2off=`expr $cyl - $s2len - 1` > s1len=`expr $s2off - 1` > s1off=1 > sfdisk -DLqf $DEVICE >/dev/null 2>&1 < $s1off $s1len b > $s2off $s2len a5 * > EOF > echo " [ OK ]" > > echo -n " formatting FAT32 partition " > dd if=/dev/zero of=${DEVICE}1 bs=$csz count=1 >/dev/null 2>&1 > mkdosfs -i 42424242 -n "Docs" -F 32 ${DEVICE}1 >/dev/null 2>&1 > mount -t vfat ${DEVICE}1 $TMPDOC || ERR=1 > if [ "$ERR" ]; then > echo " [ FAIL ]" > echo "$0: unable to mount ${DEVICE}1 on $TMPDOC" > rm -rf $TMPDOC > exit > fi > echo " [ OK ]" > > echo -n " copying documentation files " > cp -rp $DOCDIR/* $TMPDOC/ >/dev/null 2>&1 || ERR=2 > if [ "$ERR" ]; then > echo " [ FAIL ]" > echo "$0: unable to copy doc files" > ERR= > fi > umount ${DEVICE}1 > echo " [ OK ]" > > echo -n " copying system " > dd if=$ISO_FILE of=${DEVICE}2 status=noxfer >/dev/null 2>&1 > echo " [ OK ]" > > mbrsig $DEVICE 2>&1 | awk '{print " marking device with serial " $3 > }' > echo " [ Device ready! ]" > echo "" > echo -n " Create new USB key ? [Y/N] : " && read i > test "$i" = "Y" -o "$i" = "y" -o "$i" = "O" -o "$i" = "o" || i= > test -z "$i" && echo " [ leaving ]" && break > echo " Please, remove current USB key, insert new one and press enter" > read i > done > rmdir $TMPDOC > --EOF-- > > So, this is a "USB stick generator" I'm working on. > It seems to work. (I've not tested everything, but the basis is OK) > The stick is correctly parted. > The documentation is copied. > My only problem is that it still don't wan't to boot... > > At the beginning, I was trying to paste my ISO file directly in ${DEVICE}2 > Then, I found the following shell script, which is supposed to make my > bootable image from my ISO file > I changed 2/3 things, but some of you may recognize it anyway: > > --ISOtoIMG.sh-- > #!/bin/sh > MAKEFS=makefs > MKLABEL=bsdlabel > BSDTAR=tar > DD="dd status=noxfer" > > make_freebsd_image() > { > local tree=$1 > local imagefile=$2 > local boot1=${tree}/boot/boot1 > local boot2=${tree}/boot/boot2 > > echo "convert tree $tree image $imagefile" > ${MAKEFS} -t ffs -o bsize=4096 -o fsize=512 -f 50 ${imagefile} ${tree} > >/dev/null 2>&1 > ${MKLABEL} -w -f ${imagefile} auto >/dev/null 2>&1 > ${MKLABEL} -f ${imagefile} 2>/dev/null | sed -e '/ c:/{p;s/c:/a:/;}' | \ > ${MKLABEL} -R -f ${imagefile} /dev/stdin >/dev/null 2>&1 > ${DD} if=${boot1} of=${imagefile} conv=notrunc >/dev/null 2>&1 > ${DD} if=${boot2} iseek=1 ibs=276 2>/dev/null | \ > ${DD} of=${imagefile} oseek=1 obs=788 conv=notrunc >/dev/null 2>&1 > } > > extract_image() > { > [ -f $1 ] || return > local tmp="${tree}.tree" > [ -e ${tmp} ] && rm -rf ${tmp} > mkdir -p $tmp > echo "extracting $tree in $tmp" > (cd $tmp && ${BSDTAR} xf $tree) > tree=$tmp > } > > if [ -z "$1" ]; then > echo "$0: usage" >&2 > echo " $0 {ISO_input} | {system_root IMG_output}" >&2 > exit > fi > tree=`realpath $1` > [ "$2" ] && image=`realpath $2` || image=`echo $tree | sed "s/.iso/.img/"` > extract_image $tree > make_freebsd_image $tree $image > [ -d "$tmp" ] && (chmod -R +w $tmp && rm -rf $tmp) > --EOF-- > > This seems to work, too... > I'm just surprised: > root@granit:~/samuel# l -h /r00t/r00t-5.9.3.0* > -rw-r--r-- 1 root root 566M 2009-09-03 15:29 /r00t/r00t-5.9.3.0.img > -rw-r--r-- 1 root root 526M 2009-08-08 06:58 /r00t/r00t-5.9.3.0.iso > new file is 40M heavier than our iso image... > > Also, in the first script, I tried to do the first dd (initializing > ${DEVICE}) with: > - if=$MBR > - if=$BT1 > - if=$BT2 > - if=/dev/zero > none of that worked... > > So. > Does someone understand what am I doing wrong?! > > > Thanks for you help! > > Samuel Mart?n Moro > CamTrace > {EPITECH.} tek4 > > From faust64 at gmail.com Thu Sep 3 16:07:03 2009 From: faust64 at gmail.com (=?ISO-8859-1?Q?Samuel_Mart=EDn_Moro?=) Date: Thu Sep 3 16:07:10 2009 Subject: Making bootable USB keys Message-ID: Hello I'm having some troubles, trying to create bootable USB keys. I found (freebsd-hackers ML archives) a script, supposed to create the bootable image from my iso file. But, it still don't boot... (I may do it wrong) In details: -We distribute a FreeBSD (4.7, 5.4, 6.2 and 7.2) "custom" server. -We burn our install CD (and, in a few, our USB sticks) on a Ferdora 9 (sorry...) -USB sticks must contain a FAT32 partition (we'ld like to provide doc for windows users) Well, my english isn't so great... so I'll post my code (more understandable) --maker.sh-- #!/bin/sh ISO_DIR=/r00t ISO_PFIX=r00t VERSION=5.9.3.0 ISO_FILE=$ISO_DIR/$ISO_PFIX-$VERSION.img DEVICE= TMPDOC=/mnt/tmpdoc DOCDIR=/root/samuel/docdir ERR= SFX= MBR=/root/samuel/mbr BT1=/root/samuel/boot1 BT2=/root/samuel/boot2 if [ -e "$1" ]; then DEVICE=$1 elif [ "$1" -a -e "/dev/$1" ]; then DEVICE=/dev/$1 elif [ "$1" ]; then echo "$0: incorrect device specified" >&2 exit else echo "$0: must specify device" >&2 exit fi for i in `mount | cut -d ' ' -f 1` do if [ "`echo $i | grep $DEVICE`" ]; then echo "$0: $i already mounted" >&2 echo " umount it manually or choose an other drive" >&2 exit fi done if [ -e "$TMPDOC" -a -d "$TMPDOC" ]; then echo "$0: removing $TMPDOC directory" >&2 rm -rf $TMPDOC elif [ -e "$TMPDOC" ]; then mv $TMPDOC $TMPDOC.old echo "$0: moved $TMPDOC to $TMPDOC.old" >&2 fi mkdir $TMPDOC if [ "$2" ]; then echo $2 | grep "\.img$" >/dev/null || SFX=".img" fi if [ -e "$2$SFX" ]; then ISO_FILE=$2$SFX elif [ "$2" -a -e "$ISO_DIR/$2$SFX" ]; then ISO_FILE=$ISO_DIR/$2 elif [ "$2" -a -e "$ISO_DIR/$ISO_PFIX-$2$SFX" ]; then ISO_FILE="$ISO_DIR/$ISO_PFX-$2$SFX" else echo "$0: will use default file \`$ISO_FILE'" >&2 echo " as system image source" >&2 fi if [ -e "$ISO_FILE" ]; then MSize=`ls -l $ISO_FILE | awk '{print $5}'` else echo "$0: $ISO_FILE doesn't exist!" >&2 rm -rf $TMPDOC exit fi if [ -z "$MSize" -o "$MSize" -lt 1 ]; then echo "$0: bad image size (size=$MSize)" >&2 rm -rf $TMPDOC exit fi while : do echo " [ Working on $DEVICE ]" echo -n " determining device geometry " infos=`fdisk -l $DEVICE 2>/dev/null | grep "[0-9]* heads"` ident=`fdisk -l $DEVICE 2>/dev/null | awk '/Disk identifier/{print $3}'` csz=`fdisk -l $DEVICE 2>/dev/null | awk '/Units = cylinders /{print $7}'` eval `echo $infos | awk '{print "hpc=" $1 " sec=" $3 " cyl=" $5}'` if [ -z "$hpc" -o -z "$sec" -o -z "$cyl" -o -z "$csz" ]; then echo " [ FAIL ]" echo "$0: can't get infos for device $DEVICE" >&2 rm -rf $TMPDOC exit fi echo " [ OK ]" echo -n " initializing partition table " # dd if=/dev/zero of=$DEVICE bs=$csz count=1 >/dev/null 2>&1 dd if=$BT1 of=$DEVICE >/dev/null 2>&1 round=128 tocyl=`expr $hpc '*' $sec '*' $csz` ret=`expr $MSize % $tocyl` MSize=`expr $MSize / $tocyl` test "$ret" -eq "0" || MSize=`expr $MSize + 1` s2len=$MSize s2off=`expr $cyl - $s2len - 1` s1len=`expr $s2off - 1` s1off=1 sfdisk -DLqf $DEVICE >/dev/null 2>&1 </dev/null 2>&1 mkdosfs -i 42424242 -n "Docs" -F 32 ${DEVICE}1 >/dev/null 2>&1 mount -t vfat ${DEVICE}1 $TMPDOC || ERR=1 if [ "$ERR" ]; then echo " [ FAIL ]" echo "$0: unable to mount ${DEVICE}1 on $TMPDOC" rm -rf $TMPDOC exit fi echo " [ OK ]" echo -n " copying documentation files " cp -rp $DOCDIR/* $TMPDOC/ >/dev/null 2>&1 || ERR=2 if [ "$ERR" ]; then echo " [ FAIL ]" echo "$0: unable to copy doc files" ERR= fi umount ${DEVICE}1 echo " [ OK ]" echo -n " copying system " dd if=$ISO_FILE of=${DEVICE}2 status=noxfer >/dev/null 2>&1 echo " [ OK ]" mbrsig $DEVICE 2>&1 | awk '{print " marking device with serial " $3 }' echo " [ Device ready! ]" echo "" echo -n " Create new USB key ? [Y/N] : " && read i test "$i" = "Y" -o "$i" = "y" -o "$i" = "O" -o "$i" = "o" || i= test -z "$i" && echo " [ leaving ]" && break echo " Please, remove current USB key, insert new one and press enter" read i done rmdir $TMPDOC --EOF-- So, this is a "USB stick generator" I'm working on. It seems to work. (I've not tested everything, but the basis is OK) The stick is correctly parted. The documentation is copied. My only problem is that it still don't wan't to boot... At the beginning, I was trying to paste my ISO file directly in ${DEVICE}2 Then, I found the following shell script, which is supposed to make my bootable image from my ISO file I changed 2/3 things, but some of you may recognize it anyway: --ISOtoIMG.sh-- #!/bin/sh MAKEFS=makefs MKLABEL=bsdlabel BSDTAR=tar DD="dd status=noxfer" make_freebsd_image() { local tree=$1 local imagefile=$2 local boot1=${tree}/boot/boot1 local boot2=${tree}/boot/boot2 echo "convert tree $tree image $imagefile" ${MAKEFS} -t ffs -o bsize=4096 -o fsize=512 -f 50 ${imagefile} ${tree} >/dev/null 2>&1 ${MKLABEL} -w -f ${imagefile} auto >/dev/null 2>&1 ${MKLABEL} -f ${imagefile} 2>/dev/null | sed -e '/ c:/{p;s/c:/a:/;}' | \ ${MKLABEL} -R -f ${imagefile} /dev/stdin >/dev/null 2>&1 ${DD} if=${boot1} of=${imagefile} conv=notrunc >/dev/null 2>&1 ${DD} if=${boot2} iseek=1 ibs=276 2>/dev/null | \ ${DD} of=${imagefile} oseek=1 obs=788 conv=notrunc >/dev/null 2>&1 } extract_image() { [ -f $1 ] || return local tmp="${tree}.tree" [ -e ${tmp} ] && rm -rf ${tmp} mkdir -p $tmp echo "extracting $tree in $tmp" (cd $tmp && ${BSDTAR} xf $tree) tree=$tmp } if [ -z "$1" ]; then echo "$0: usage" >&2 echo " $0 {ISO_input} | {system_root IMG_output}" >&2 exit fi tree=`realpath $1` [ "$2" ] && image=`realpath $2` || image=`echo $tree | sed "s/.iso/.img/"` extract_image $tree make_freebsd_image $tree $image [ -d "$tmp" ] && (chmod -R +w $tmp && rm -rf $tmp) --EOF-- This seems to work, too... I'm just surprised: root@granit:~/samuel# l -h /r00t/r00t-5.9.3.0* -rw-r--r-- 1 root root 566M 2009-09-03 15:29 /r00t/r00t-5.9.3.0.img -rw-r--r-- 1 root root 526M 2009-08-08 06:58 /r00t/r00t-5.9.3.0.iso new file is 40M heavier than our iso image... Also, in the first script, I tried to do the first dd (initializing ${DEVICE}) with: - if=$MBR - if=$BT1 - if=$BT2 - if=/dev/zero none of that worked... So. Does someone understand what am I doing wrong?! Thanks for you help! Samuel Mart?n Moro CamTrace {EPITECH.} tek4 From doconnor at gsoft.com.au Fri Sep 4 13:39:06 2009 From: doconnor at gsoft.com.au (Daniel O'Connor) Date: Fri Sep 4 13:39:13 2009 Subject: Making bootable USB keys In-Reply-To: References: Message-ID: <200909042308.56900.doconnor@gsoft.com.au> WARNING: This e-mail has been altered by MIMEDefang. Following this paragraph are indications of the actual changes made. For more information about your site's MIMEDefang policy, contact Postmaster . For more information about MIMEDefang, see: http://www.roaringpenguin.com/mimedefang/enduser.php3 An attachment named makeusb.sh was removed from this document as it constituted a security hazard. If you require this document, please contact the sender and arrange an alternate means of receiving it. -------------- next part -------------- Skipped content of type multipart/signed From doconnor at gsoft.com.au Fri Sep 4 14:27:59 2009 From: doconnor at gsoft.com.au (Daniel O'Connor) Date: Fri Sep 4 14:28:06 2009 Subject: Making bootable USB keys In-Reply-To: <200909042308.56900.doconnor@gsoft.com.au> References: <200909042308.56900.doconnor@gsoft.com.au> Message-ID: <200909042357.50470.doconnor@gsoft.com.au> On Fri, 4 Sep 2009, Daniel O'Connor wrote: > WARNING: This e-mail has been altered by MIMEDefang. Following this > paragraph are indications of the actual changes made. For more > information about your site's MIMEDefang policy, contact > Postmaster . For more information about > MIMEDefang, see: > > http://www.roaringpenguin.com/mimedefang/enduser.php3 > > An attachment named makeusb.sh was removed from this document as it > constituted a security hazard. If you require this document, please > contact the sender and arrange an alternate means of receiving it. Oops try this http://www.gsoft.com.au/~doconnor/makeusb.sh -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 188 bytes Desc: This is a digitally signed message part. Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090904/672d42a1/attachment.pgp From alfred at freebsd.org Fri Sep 4 17:25:11 2009 From: alfred at freebsd.org (Alfred Perlstein) Date: Fri Sep 4 17:25:18 2009 Subject: memchr() strangeness In-Reply-To: <4AA14437.4050507@FreeBSD.org> References: <4AA14437.4050507@FreeBSD.org> Message-ID: <20090904172511.GI21946@elvis.mu.org> Moved to -hackers. Gabor, can you please make a smaller program to exhibit this behavior? (not just the error line) I will be glad to help out. -Alfred * Gabor Kovesdan [090904 10:04] wrote: > Hello, > > having returned from vacation, I'm trying to track down the (hopefully) > last critical bug in BSDL grep I worked on the last summer. The binary > file detection is implemented as follows: > f->binary = memchr(binbuf, (filebehave != FILE_GZIP) ? '\0' : '\200', i > - 1) != NULL; > > There's some strange with this. In my normal environment it works fine: > server# echo foobar | ./grep -v '^ *+' > foobar > > But in a chroot environment the binary detection is broken: > # echo foobar | grep -v '^ *+' > foobar > Binary file (standard input) matches > > I don't know where things go bad. I've tried to print out the content of > the buffer and the buffer length and they are the same but somehow in > the chrooted environment this sets f->binary to true. > Any suggestions? > > Thanks in advance, > > -- > Gabor Kovesdan > FreeBSD Volunteer > > EMAIL: gabor@FreeBSD.org .:|:. gabor@kovesdan.org > WEB: http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org > > -- > This mail is for the internal use of the FreeBSD project committers, > and as such is private. This mail may not be published or forwarded > outside the FreeBSD committers' group or disclosed to other unauthorised > parties without the explicit permission of the author(s). -- - Alfred Perlstein .- AMA, VMOA #5191, 03 vmax, 92 gs500, 85 ch250 .- FreeBSD committer From gabor at FreeBSD.org Fri Sep 4 17:59:05 2009 From: gabor at FreeBSD.org (Gabor Kovesdan) Date: Fri Sep 4 17:59:12 2009 Subject: memchr() strangeness In-Reply-To: <20090904172511.GI21946@elvis.mu.org> References: <4AA14437.4050507@FreeBSD.org> <20090904172511.GI21946@elvis.mu.org> Message-ID: <4AA151C9.1090301@FreeBSD.org> Alfred Perlstein escribi?: > Moved to -hackers. > Thanks, this was my original intention. > Gabor, can you please make a smaller program to exhibit this behavior? > (not just the error line) > > I will be glad to help out. > After reading your mail, I've made a small program: #include #include #include #include int main(int argc, char *argv[]) { bool foo; foo = memchr(argv[1], '\0', strlen(argv[1])); if (foo) fprintf(stderr, "Ooooops!\n"); } And it works correctly, so actually grep fails somewhere else but it's very strange why it behaves differently jailed (or chrooted). Once submitted it for a portbuild test because it had been working correctly for me on a production system and then it failed on the cluster because the package build run jailed. And then I created a jail and in fact I could reproduce this but only in the jail. Regards, -- Gabor Kovesdan FreeBSD Volunteer EMAIL: gabor@FreeBSD.org .:|:. gabor@kovesdan.org WEB: http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org From max at love2party.net Fri Sep 4 18:14:26 2009 From: max at love2party.net (Max Laier) Date: Fri Sep 4 18:14:33 2009 Subject: memchr() strangeness In-Reply-To: <4AA151C9.1090301@FreeBSD.org> References: <4AA14437.4050507@FreeBSD.org> <20090904172511.GI21946@elvis.mu.org> <4AA151C9.1090301@FreeBSD.org> Message-ID: <200909042014.21290.max@love2party.net> On Friday 04 September 2009 19:43:37 Gabor Kovesdan wrote: > Alfred Perlstein escribi?: > > Moved to -hackers. > > Thanks, this was my original intention. > > > Gabor, can you please make a smaller program to exhibit this behavior? > > (not just the error line) > > > > I will be glad to help out. > > After reading your mail, I've made a small program: > > #include > #include > #include > #include > > int > main(int argc, char *argv[]) > { > bool foo; > > foo = memchr(argv[1], '\0', strlen(argv[1])); > if (foo) > fprintf(stderr, "Ooooops!\n"); > > } > > And it works correctly, so actually grep fails somewhere else but it's > very strange why it behaves differently jailed (or chrooted). Once > submitted it for a portbuild test because it had been working correctly > for me on a production system and then it failed on the cluster because > the package build run jailed. And then I created a jail and in fact I > could reproduce this but only in the jail. LC_* set to a locale not available in the jail? Just a wild guess. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News From mitchell at wyatt672earp.force9.co.uk Sat Sep 5 17:03:39 2009 From: mitchell at wyatt672earp.force9.co.uk (Frank Mitchell) Date: Sat Sep 5 17:03:46 2009 Subject: Death By NetBSD Message-ID: <200909051717.15321.mitchell@wyatt672earp.force9.co.uk> Hi: Recently I installed NetBSD, then found FreeBSD wouldn't start. I had this problem before and believed it was due to a bug in the NetBSD Boot Selector, which I avoided installing. But this time it looked as if my FreeBSD Partition got wiped completely. Re-trying, it looked like NetBSD spotted the FreeBSD FFSv2 Partition and decided to assign it a Mount Point of "/". This is listed if you look closely under "NetBSD Disklabel Partitions... last chance to change". I edited that Mount Point away and afterwards my (reinstalled) FreeBSD was still present. Hey, I'm glad I keep my Data on a separate Partition. Am I the only guy who didn't know about this? Yours Truly: Frank Mitchell From rivanr at gmail.com Mon Sep 7 10:59:58 2009 From: rivanr at gmail.com (Ivan Radovanovic) Date: Mon Sep 7 11:00:05 2009 Subject: Kernel panic caused by fork Message-ID: <4AA4E7A7.60503@gmail.com> I was testing FreeBSD's behavior when running many threads at the same time (and I find it performs excellent) when I wanted to test how system will behave towards program that spawns itself too many times. I wrote a very simple program #include #include int main() { while(1) fork(); return 0; } After running this program I got kernel panic with message "get_pv_entry: increase vm.pmap.shpgperproc" IMHO it is not very good idea to bring entire system down if one process misbehaves in this way, it is maybe much better to kill offending process and to send this message to system log. I am not sure whether the panic is actually caused by process forking forever or when the system tries to create new process when maxproc limit is already reached (since system is only printing warning message that maxproc limit is reached and it only panics when I try to start new process (like ps)). System is FreeBSD 7.2-STABLE kernel backtrace: (kgdb) bt #0 doadump () at pcpu.h:196 #1 0xc05fc477 in boot (howto=260) at ../../../kern/kern_shutdown.c:418 #2 0xc05fc782 in panic (fmt=Variable "fmt" is not available. ) at ../../../kern/kern_shutdown.c:574 #3 0xc087bccf in get_pv_entry (pmap=0xca0cb43c, try=0) at ../../../i386/i386/pmap.c:2067 #4 0xc087c0db in pmap_insert_entry (pmap=Variable "pmap" is not available. ) at ../../../i386/i386/pmap.c:2203 #5 0xc087f08e in pmap_enter (pmap=0xca0cb43c, va=671973376, access=1 '\001', m=Variable "m" is not available. ) at ../../../i386/i386/pmap.c:3114 #6 0xc082a947 in vm_fault (map=0xca0cb3b0, vaddr=671973376, fault_type=1 '\001', fault_flags=0) at ../../../vm/vm_fault.c:891 #7 0xc0881acb in trap_pfault (frame=0xefc1bd38, usermode=1, eva=671975739) at ../../../i386/i386/trap.c:828 #8 0xc0882420 in trap (frame=0xefc1bd38) at ../../../i386/i386/trap.c:396 #9 0xc086724b in calltrap () at ../../../i386/i386/exception.s:166 #10 0x280d893b in ?? () Previous frame inner to this frame (corrupt stack?) From rivanr at gmail.com Tue Sep 8 09:09:18 2009 From: rivanr at gmail.com (Ivan Radovanovic) Date: Tue Sep 8 09:09:24 2009 Subject: Kernel panic caused by fork In-Reply-To: References: <4AA4E7A7.60503@gmail.com> Message-ID: <4AA61F3A.3040802@gmail.com> Jan Mikkelsen napisa: > A quick observation: This is not "one process misbehaving", it is a > large number of processes misbehaving. From an administrative point > of view, I think the response is "call setrlimit(RLIMIT_NPROC, ...)", > otherwise the expected behaviour is for your machine to stop making > forward progress. > > Having said that, I agree that panics are bad and it would be nice if > fork() returned EAGAIN, again and again and again. Or perhaps the > machine should just panic ... from fork(2) page - about errors [EAGAIN] The system-imposed limit on the total number of pro- cesses under execution would be exceeded. The limit is given by the sysctl(3) MIB variable KERN_MAXPROC. (The limit is actually ten less than this except for the super user). it seems that idea is to leave room for 10 more processes so root can kill offending process, and limits at my system are (I am running pretty much generic kernel) kern.maxproc: 6164 kern.maxprocperuid: 5547 so if there are only two users running at the same time in the system (the case when I did this testing) there is room for more than 500 processes after one user hits his limit - shouldn't panic I think Regards, Ivan From janm-freebsd-hackers at transactionware.com Tue Sep 8 09:19:57 2009 From: janm-freebsd-hackers at transactionware.com (Jan Mikkelsen) Date: Tue Sep 8 09:20:04 2009 Subject: Kernel panic caused by fork In-Reply-To: <4AA4E7A7.60503@gmail.com> References: <4AA4E7A7.60503@gmail.com> Message-ID: Hi, On 07/09/2009, at 8:59 PM, Ivan Radovanovic wrote: ... > After running this program I got kernel panic with message > "get_pv_entry: increase vm.pmap.shpgperproc" > IMHO it is not very good idea to bring entire system down if one > process misbehaves in this way, it is maybe much better to kill > offending process and to send this message to system log. I am not > sure whether the panic is actually caused by process forking forever > or when the system tries to create new process when maxproc limit is > already reached (since system is only printing warning message that > maxproc limit is reached and it only panics when I try to start new > process (like ps)). A quick observation: This is not "one process misbehaving", it is a large number of processes misbehaving. From an administrative point of view, I think the response is "call setrlimit(RLIMIT_NPROC, ...)", otherwise the expected behaviour is for your machine to stop making forward progress. Having said that, I agree that panics are bad and it would be nice if fork() returned EAGAIN, again and again and again. Or perhaps the machine should just panic ... Regards, Jan. From crquan at gmail.com Tue Sep 8 10:49:01 2009 From: crquan at gmail.com (Cheng Renquan) Date: Tue Sep 8 10:49:07 2009 Subject: Kernel panic caused by fork In-Reply-To: <4AA4E7A7.60503@gmail.com> References: <4AA4E7A7.60503@gmail.com> Message-ID: <91b13c310909080322s21e0fb02o423434206e5f96f6@mail.gmail.com> On Mon, Sep 7, 2009 at 6:59 PM, Ivan Radovanovic wrote: > I was testing FreeBSD's behavior when running many threads at the same time > (and I find it performs excellent) when I wanted to test how system will > behave towards program that spawns itself too many times. I wrote a very > simple program > > #include > #include > > int main() { > ?while(1) > ? fork(); > ?return 0; > } > > After running this program I got kernel panic with message > "get_pv_entry: increase vm.pmap.shpgperproc" > IMHO it is not very good idea to bring entire system down if one process > misbehaves in this way, it is maybe much better to kill offending process > and to send this message to system log. I am not sure whether the panic is > actually caused by process forking forever or when the system tries to > create new process when maxproc limit is already reached (since system is > only printing warning message that maxproc limit is reached and it only > panics when I try to start new process (like ps)). > System is FreeBSD 7.2-STABLE It's just the "fork bomb" problem, all operating system kernels cannot deal with it well, http://en.wikipedia.org/wiki/Fork_bomb And it's really a system administration problem rather than a kernel problem, -- Cheng Renquan (???), from Shenzhen, China From joachim.kuebart at gmx.net Tue Sep 8 11:12:31 2009 From: joachim.kuebart at gmx.net (Joachim Kuebart) Date: Tue Sep 8 11:12:38 2009 Subject: License change Message-ID: <1252406745.778.22.camel@yacht> Hi, much to my embarrassment, I noticed recently that there is a file authored by me using the 4-clause BSD license in the FreeBSD tree. The file src/sys/dev/sound/pci/es137x.c uses the 4-clause BSD license while the accompanying .h file uses a kind of 3-clause BSD license that I apparently made up at the time. I would like to change the license of es137x.c to the 3-clause BSD license. Unfortunately I cannot prove that I'm in fact the original author because the e-mail address given in the file is no longer active. If this means that the license cannot be changed anymore, that's unfortunate, but I guess it's the way it has to be... Best regards, Joachim From julian at elischer.org Tue Sep 8 16:24:38 2009 From: julian at elischer.org (Julian Elischer) Date: Tue Sep 8 16:24:46 2009 Subject: Kernel panic caused by fork In-Reply-To: <91b13c310909080322s21e0fb02o423434206e5f96f6@mail.gmail.com> References: <4AA4E7A7.60503@gmail.com> <91b13c310909080322s21e0fb02o423434206e5f96f6@mail.gmail.com> Message-ID: <4AA68544.8050102@elischer.org> Cheng Renquan wrote: > On Mon, Sep 7, 2009 at 6:59 PM, Ivan Radovanovic wrote: >> I was testing FreeBSD's behavior when running many threads at the same time >> (and I find it performs excellent) when I wanted to test how system will >> behave towards program that spawns itself too many times. I wrote a very >> simple program >> >> #include >> #include >> >> int main() { >> while(1) >> fork(); >> return 0; >> } >> >> After running this program I got kernel panic with message >> "get_pv_entry: increase vm.pmap.shpgperproc" >> IMHO it is not very good idea to bring entire system down if one process >> misbehaves in this way, it is maybe much better to kill offending process >> and to send this message to system log. I am not sure whether the panic is >> actually caused by process forking forever or when the system tries to >> create new process when maxproc limit is already reached (since system is >> only printing warning message that maxproc limit is reached and it only >> panics when I try to start new process (like ps)). >> System is FreeBSD 7.2-STABLE > > It's just the "fork bomb" problem, all operating system kernels cannot > deal with it well, > > http://en.wikipedia.org/wiki/Fork_bomb It's more a tuning problem I think. The system should tune itself so that MAXPROX is hit before critical resources are exhausted I think. Having said that, there are a lot of resources that need to be watched. > > And it's really a system administration problem rather than a kernel problem, > From rivanr at gmail.com Tue Sep 8 16:42:05 2009 From: rivanr at gmail.com (Ivan Radovanovic) Date: Tue Sep 8 16:42:14 2009 Subject: Kernel panic caused by fork In-Reply-To: <4AA68544.8050102@elischer.org> References: <4AA4E7A7.60503@gmail.com> <91b13c310909080322s21e0fb02o423434206e5f96f6@mail.gmail.com> <4AA68544.8050102@elischer.org> Message-ID: <4AA68959.6000808@gmail.com> Julian Elischer napisa: > Cheng Renquan wrote: >> On Mon, Sep 7, 2009 at 6:59 PM, Ivan Radovanovic >> wrote: >>> I was testing FreeBSD's behavior when running many threads at the >>> same time >>> (and I find it performs excellent) when I wanted to test how system >>> will >>> behave towards program that spawns itself too many times. I wrote a >>> very >>> simple program >> It's just the "fork bomb" problem, all operating system kernels cannot >> deal with it well, >> >> http://en.wikipedia.org/wiki/Fork_bomb > It's more a tuning problem I think. The system should tune itself so > that MAXPROX is hit before critical resources are exhausted I think. > Having said that, there are a lot of resources that need to be watched. After reading this nice article on wikipedia and learning about that bash one liner I wanted to check if it really works, but I didn't want to bring the system down again (and to create crash dump and so on), so I wanted to limit number of processes for single user and I did sysctl kern.maxprocperuid=1000 as root, and after that I started bash and typed :(){ :|:& };: as normal user First thing to notice - there was more than 4000 spawned bash processes (why if I set limit to 1000 per user id?), however system didn't crash and I was eventually able to recover with /bin/kill -9 -- -1234 1234 being process group id of bash process Regards, Ivan From ivoras at freebsd.org Tue Sep 8 21:01:49 2009 From: ivoras at freebsd.org (Ivan Voras) Date: Tue Sep 8 21:01:56 2009 Subject: Kernel panic caused by fork In-Reply-To: <4AA4E7A7.60503@gmail.com> References: <4AA4E7A7.60503@gmail.com> Message-ID: Ivan Radovanovic wrote: > I was testing FreeBSD's behavior when running many threads at the same > time (and I find it performs excellent) when I wanted to test how system > will behave towards program that spawns itself too many times. I wrote a > very simple program > > #include > #include > > int main() { > while(1) > fork(); > return 0; > } A simple fork bomb. Hmm, it should just crash and if it does crash it's a regression. I've "tested" fork bombs on 7-STABLE and early 8-CURRENT and they were behaving as expected - stopped at the maxproc limit. I don't currently have spare 7.x stable machines but I have just run it on 8-BETA2 one and the maxproc limit still works, though as expected the console is almost unusable for anything except switching (i.e. processes don't get to receive input very often). A lot of them are in "locked" state with "*vm ob" as state/channel name. I couldn't clean the system from the fork bomb with "killall" as root. Can you describe your machine? My is an Atom-based (slow) netbook with 1 GB RAM. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 259 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090908/a679699b/signature.pgp From alexbestms at math.uni-muenster.de Wed Sep 9 17:01:34 2009 From: alexbestms at math.uni-muenster.de (Alexander Best) Date: Wed Sep 9 17:01:41 2009 Subject: Buffer overflow detected by REDZONE with linuxulator Message-ID: hi there, i've installed emulators/linux_dist-gentoo-stage3 and grabbed a snapshot from the ltp git repository (http://ltp.sourceforge.net/). as expected some tests failed because i'm using compat.linux.osrelease: 2.6.16 which is still missing a few linux syscalls, ipcs and ioctls. however i also noticed REDZONE reporting buffer overflows. i'm only a user and not a developer so i don't know if the ltp is to be blamed or if the problem lies within the linuxulator. i'm running 9.0-CURRENT (r196879). as i mentioned before i'm using 2.6 linux kernel emulation. here are the buffer overflow reports: Sep 9 14:12:42 otaku kernel: REDZONE: Buffer overflow detected. 9 bytes corrupted after 0xcc28c483 (3 bytes allocated). Sep 9 14:12:42 otaku kernel: Allocation backtrace: Sep 9 14:12:42 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a Sep 9 14:12:42 otaku kernel: #1 0xc05bc673 at malloc+0x1c3 Sep 9 14:12:42 otaku kernel: #2 0xc07428b8 at linux_getsockaddr+0x48 Sep 9 14:12:42 otaku kernel: #3 0xc0742eb8 at linux_socketcall+0x178 Sep 9 14:12:42 otaku kernel: #4 0xc0772f56 at syscall+0x2a6 Sep 9 14:12:42 otaku kernel: #5 0xc07568b0 at Xint0x80_syscall+0x20 Sep 9 14:12:42 otaku kernel: Free backtrace: Sep 9 14:12:42 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a Sep 9 14:12:42 otaku kernel: #1 0xc05bc32d at free+0x5d Sep 9 14:12:42 otaku kernel: #2 0xc0742ef0 at linux_socketcall+0x1b0 Sep 9 14:12:42 otaku kernel: #3 0xc0772f56 at syscall+0x2a6 Sep 9 14:12:42 otaku kernel: #4 0xc07568b0 at Xint0x80_syscall+0x20 Sep 9 14:20:08 otaku kernel: REDZONE: Buffer overflow detected. 4 bytes corrupted after 0xcc2538ea (106 bytes allocated). Sep 9 14:20:08 otaku kernel: Allocation backtrace: Sep 9 14:20:08 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a Sep 9 14:20:08 otaku kernel: #1 0xc05bc673 at malloc+0x1c3 Sep 9 14:20:08 otaku kernel: #2 0xc063a902 at unp_connect+0x162 Sep 9 14:20:08 otaku kernel: #3 0xc063d6c9 at uipc_connect+0x49 Sep 9 14:20:08 otaku kernel: #4 0xc062fde2 at soconnect+0x52 Sep 9 14:20:08 otaku kernel: #5 0xc0638eb6 at kern_connect+0x96 Sep 9 14:20:08 otaku kernel: #6 0xc0742c7b at linux_connect+0x3b Sep 9 14:20:08 otaku kernel: #7 0xc0742f22 at linux_socketcall+0x1e2 Sep 9 14:20:08 otaku kernel: #8 0xc0772f56 at syscall+0x2a6 Sep 9 14:20:08 otaku kernel: #9 0xc07568b0 at Xint0x80_syscall+0x20 Sep 9 14:20:08 otaku kernel: Free backtrace: Sep 9 14:20:08 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a Sep 9 14:20:08 otaku kernel: #1 0xc05bc32d at free+0x5d Sep 9 14:20:08 otaku kernel: #2 0xc063bfb2 at uipc_detach+0x242 Sep 9 14:20:08 otaku kernel: #3 0xc0632a7e at sofree+0x22e Sep 9 14:20:08 otaku kernel: #4 0xc0632f26 at soclose+0x386 Sep 9 14:20:08 otaku kernel: #5 0xc0617c49 at soo_close+0x29 Sep 9 14:20:08 otaku kernel: #6 0xc0598b13 at _fdrop+0x43 Sep 9 14:20:08 otaku kernel: #7 0xc059ab90 at closef+0x290 Sep 9 14:20:08 otaku kernel: #8 0xc059af22 at kern_close+0x102 Sep 9 14:20:08 otaku kernel: #9 0xc059b09a at close+0x1a Sep 9 14:20:08 otaku kernel: #10 0xc0772f56 at syscall+0x2a6 Sep 9 14:20:08 otaku kernel: #11 0xc07568b0 at Xint0x80_syscall+0x20 Sep 9 14:20:09 otaku kernel: REDZONE: Buffer overflow detected. 4 bytes corrupted after 0xccc653ea (106 bytes allocated). Sep 9 14:20:09 otaku kernel: Allocation backtrace: Sep 9 14:20:09 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a Sep 9 14:20:09 otaku kernel: #1 0xc05bc673 at malloc+0x1c3 Sep 9 14:20:09 otaku kernel: #2 0xc063a902 at unp_connect+0x162 Sep 9 14:20:09 otaku kernel: #3 0xc063d6c9 at uipc_connect+0x49 Sep 9 14:20:09 otaku kernel: #4 0xc062fde2 at soconnect+0x52 Sep 9 14:20:09 otaku kernel: #5 0xc0638eb6 at kern_connect+0x96 Sep 9 14:20:09 otaku kernel: #6 0xc0742c7b at linux_connect+0x3b Sep 9 14:20:09 otaku kernel: #7 0xc0742f22 at linux_socketcall+0x1e2 Sep 9 14:20:09 otaku kernel: #8 0xc0772f56 at syscall+0x2a6 Sep 9 14:20:09 otaku kernel: #9 0xc07568b0 at Xint0x80_syscall+0x20 Sep 9 14:20:09 otaku kernel: Free backtrace: Sep 9 14:20:09 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a Sep 9 14:20:09 otaku kernel: #1 0xc05bc32d at free+0x5d Sep 9 14:20:09 otaku kernel: #2 0xc063bfb2 at uipc_detach+0x242 Sep 9 14:20:09 otaku kernel: #3 0xc0632a7e at sofree+0x22e Sep 9 14:20:09 otaku kernel: #4 0xc0632f26 at soclose+0x386 Sep 9 14:20:09 otaku kernel: #5 0xc0617c49 at soo_close+0x29 Sep 9 14:20:09 otaku kernel: #6 0xc0598b13 at _fdrop+0x43 Sep 9 14:20:09 otaku kernel: #7 0xc059ab90 at closef+0x290 Sep 9 14:20:09 otaku kernel: #8 0xc059af22 at kern_close+0x102 Sep 9 14:20:09 otaku kernel: #9 0xc059b09a at close+0x1a Sep 9 14:20:09 otaku kernel: #10 0xc0772f56 at syscall+0x2a6 Sep 9 14:20:09 otaku kernel: #11 0xc07568b0 at Xint0x80_syscall+0x20 Sep 9 14:20:09 otaku kernel: REDZONE: Buffer overflow detected. 4 bytes corrupted after 0xcf45a9ea (106 bytes allocated). Sep 9 14:20:09 otaku kernel: Allocation backtrace: Sep 9 14:20:09 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a Sep 9 14:20:09 otaku kernel: #1 0xc05bc673 at malloc+0x1c3 Sep 9 14:20:09 otaku kernel: #2 0xc063a902 at unp_connect+0x162 Sep 9 14:20:09 otaku kernel: #3 0xc063d6c9 at uipc_connect+0x49 Sep 9 14:20:09 otaku kernel: #4 0xc062fde2 at soconnect+0x52 Sep 9 14:20:09 otaku kernel: #5 0xc0638eb6 at kern_connect+0x96 Sep 9 14:20:09 otaku kernel: #6 0xc0742c7b at linux_connect+0x3b Sep 9 14:20:09 otaku kernel: #7 0xc0742f22 at linux_socketcall+0x1e2 Sep 9 14:20:09 otaku kernel: #8 0xc0772f56 at syscall+0x2a6 Sep 9 14:20:09 otaku kernel: #9 0xc07568b0 at Xint0x80_syscall+0x20 Sep 9 14:20:09 otaku kernel: Free backtrace: Sep 9 14:20:09 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a Sep 9 14:20:09 otaku kernel: #1 0xc05bc32d at free+0x5d Sep 9 14:20:09 otaku kernel: #2 0xc063bfb2 at uipc_detach+0x242 Sep 9 14:20:09 otaku kernel: #3 0xc0632a7e at sofree+0x22e Sep 9 14:20:09 otaku kernel: #4 0xc0632f26 at soclose+0x386 Sep 9 14:20:09 otaku kernel: #5 0xc0617c49 at soo_close+0x29 Sep 9 14:20:09 otaku kernel: #6 0xc0598b13 at _fdrop+0x43 Sep 9 14:20:09 otaku kernel: #7 0xc059ab90 at closef+0x290 Sep 9 14:20:09 otaku kernel: #8 0xc059b55a at fdfree+0x3ea Sep 9 14:20:09 otaku kernel: #9 0xc05a57b3 at exit1+0x513 Sep 9 14:20:09 otaku kernel: #10 0xc05d17f4 at sigexit+0xa14 Sep 9 14:20:09 otaku kernel: #11 0xc05d19fd at postsig+0x1dd Sep 9 14:20:09 otaku kernel: #12 0xc0608fca at ast+0x35a Sep 9 14:20:09 otaku kernel: #13 0xc0757174 at doreti_ast+0x17 cheers. alex From guomingyan at gmail.com Thu Sep 10 06:55:57 2009 From: guomingyan at gmail.com (MingyanGuo) Date: Thu Sep 10 06:56:04 2009 Subject: How to prevent other CPU from accessing a set of pages before calling pmap_remove_all function Message-ID: <1fa17f810909092326l1271df94t1dea5ac9d5deba1b@mail.gmail.com> Hi all, I find that function pmap_remove_all for arch amd64 works with a time window between reading & clearing the PTE flags(access flag and dirty flag) and invalidating its TLB entry on other CPU. After some discussion with Li Xin(cced), I think all the processes that are using the PTE being removed should be blocked before calling pmap_remove_all, or other CPU may dirty the page but does not set the dirty flag before the TLB entry is flushed. But I can not find how to block them to call the function. I read the function vm_pageout_scan in file vm/vm_pageout.c but can not find the exact method it used. Or I just misunderstood the semantics of function pmap_remove_all ? Thanks in advance. Regards, MingyanGuo From guomingyan at gmail.com Thu Sep 10 06:57:25 2009 From: guomingyan at gmail.com (MingyanGuo) Date: Thu Sep 10 06:57:37 2009 Subject: How to prevent other CPU from accessing a set of pages before calling pmap_remove_all function In-Reply-To: <1fa17f810909092326l1271df94t1dea5ac9d5deba1b@mail.gmail.com> References: <1fa17f810909092326l1271df94t1dea5ac9d5deba1b@mail.gmail.com> Message-ID: <1fa17f810909092357x8625182q970f8fb6aa76e7a9@mail.gmail.com> On Wed, Sep 9, 2009 at 11:26 PM, MingyanGuo wrote: > Hi all, > > I find that function pmap_remove_all for arch amd64 works with a time > window between reading & clearing the PTE flags(access flag and dirty flag) > and invalidating its TLB entry on other CPU. After some discussion with Li > Xin(cced), I think all the processes that are using the PTE being removed > should be blocked before calling pmap_remove_all, or other CPU may dirty the > page but does not set the dirty flag before the TLB entry is flushed. But I > can not find how to block them to call the function. I read the function > vm_pageout_scan in file vm/vm_pageout.c but can not find the exact method it > used. Or I just misunderstood the semantics of function pmap_remove_all ? > > Thanks in advance. > > Regards, > MingyanGuo > Sorry for the noise. I understand the logic now. There is no time window problem between reading & clearing the PTE and invalidating it on other CPU, even if other CPU is using the PTE. I misunderstood the logic. Regards, MingyanGuo From kostikbel at gmail.com Thu Sep 10 12:08:51 2009 From: kostikbel at gmail.com (Kostik Belousov) Date: Thu Sep 10 12:08:58 2009 Subject: How to prevent other CPU from accessing a set of pages before calling pmap_remove_all function In-Reply-To: <1fa17f810909092357x8625182q970f8fb6aa76e7a9@mail.gmail.com> References: <1fa17f810909092326l1271df94t1dea5ac9d5deba1b@mail.gmail.com> <1fa17f810909092357x8625182q970f8fb6aa76e7a9@mail.gmail.com> Message-ID: <20090910120811.GH47688@deviant.kiev.zoral.com.ua> On Wed, Sep 09, 2009 at 11:57:24PM -0700, MingyanGuo wrote: > On Wed, Sep 9, 2009 at 11:26 PM, MingyanGuo wrote: > > > Hi all, > > > > I find that function pmap_remove_all for arch amd64 works with a time > > window between reading & clearing the PTE flags(access flag and dirty flag) > > and invalidating its TLB entry on other CPU. After some discussion with Li > > Xin(cced), I think all the processes that are using the PTE being removed > > should be blocked before calling pmap_remove_all, or other CPU may dirty the > > page but does not set the dirty flag before the TLB entry is flushed. But I > > can not find how to block them to call the function. I read the function > > vm_pageout_scan in file vm/vm_pageout.c but can not find the exact method it > > used. Or I just misunderstood the semantics of function pmap_remove_all ? > > > > Thanks in advance. > > > > Regards, > > MingyanGuo > > > > Sorry for the noise. I understand the logic now. There is no time window > problem between reading & clearing the PTE and invalidating it on other CPU, > even if other CPU is using the PTE. I misunderstood the logic. Hmm. What would happen for the following scenario. Assume that the page m is mapped by vm map active on CPU1, and that CPU1 has cached TLB entry for some writable mapping of this page, but neither TLB entry not PTE has dirty bit set. Then, assume that the following sequence of events occur: CPU1: CPU2: call pmap_remove_all(m) clear pte write to the address mapped by m [*] invalidate the TLB, possibly making IPI to CPU1 I assume that at the point marked [*], we can - either loose the dirty bit, while CPU1 (atomically) sets the dirty bit in the cleared pte. Besides not properly tracking the modification status of the page, it could also cause the page table page to be modified, that would create non-zero page with PG_ZERO flag set. - or CPU1 re-reads the PTE entry when setting the dirty bit, and generates #pf since valid bit in PTE is zero. Intel documentation mentions that dirty or accessed bits updates are done with locked cycle, that definitely means that PTE is re-read, but I cannot find whether valid bit is rechecked. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090910/073a97f6/attachment.pgp From linda.messerschmidt at gmail.com Thu Sep 10 16:46:46 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Thu Sep 10 16:46:53 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <200908271729.55213.jhb@freebsd.org> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200908261642.59419.jhb@freebsd.org> <237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com> <200908271729.55213.jhb@freebsd.org> Message-ID: <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com> On Thu, Aug 27, 2009 at 5:29 PM, John Baldwin wrote: > Ah, cool, what you want to do is use KTR with KTR_SCHED and then use > schedgraph.py (src/tools/sched) to get a visual picture of what the box does > during a hang. ?The timestamps in KTR are TSC cycle counts rather than an > actual wall time which is why they look off. ?If you have a lot of events you > may want to use a larger KTR_ENTRIES size btw (I use 1048576 (2 ^ 20) here at > work to get large (multiple-second) traces). I'm still working on this. I enabled KTR and set it up to log KTR_SCHED events. Then, I wrote a script to exercise the HTTP server that actually ran on that machine, and set it to issue "sysctl debug.ktr.cpumask=0" and abort if a request took over 2 seconds. 28,613 requests later, it tripped over one that took 2007ms. (Just a refresher: this is a static file being served by an Apache process that has nothing else to do but serve this file on a relatively unloaded machine.) I don't have access to any machines that can run X, so I did the best I could to examine it from the shell. First, this machine has two CPU's so I split up the KTR results on a per-CPU basis so I could look at each individually. With KTR_ENTRIES set to 1048576, I got about 53 seconds of data with just KTR_SCHED enabled. Since I was interested in a 2.007 second period of time right at the end, I hacked it down to the last 3.795 seconds. In the 3.795 seconds captured in the trace period on CPU 0 that includes the entire 2.007 second stall, CPU 0 was idle for 3.175 seconds. In the same period, CPU 1 was idle for 3.2589 seconds. I did the best I could to manually page through all the scheduling activity on both CPUs during that 3.7 second time, and I didn't see anything really disruptive. Mainly idle, with jumps into the clock and ethernet kernel threads, as well as httpd. If I understand that correctly and have done everything right, that means that whatever happened, it wasn't related to CPU contention or scheduling issues of any sort. So, a couple of follow-up questions: First, what else should I be looking at? I built the kernel with kind of a lot of KTR flags (KTR_LOCK|KTR_SCHED|KTR_PROC|KTR_INTR|KTR_CALLOUT|KTR_UMA|KTR_SYSC) but enabling them all produces enough output that even 1048576 entries doesn't always go back two seconds; the volume of data is all but unmanageable. Second, is there any way to correlate the process address reported by the KTR scheduler entries back to a PID? It'd be nice to be able to view the scheduler activity just for the process I'm interested in, but I can't figure out which one it is. :) Thanks! From guomingyan at gmail.com Thu Sep 10 17:21:02 2009 From: guomingyan at gmail.com (guomingyan@gmail.com) Date: Thu Sep 10 17:21:10 2009 Subject: How to prevent other CPU from accessing a set of pages before calling pmap_remove_all functi In-Reply-To: <20090910120811.GH47688@deviant.kiev.zoral.com.ua> Message-ID: <001636b149b575c79204733c6c1c@google.com> On Sep 10, 2009 5:08am, Kostik Belousov wrote: > On Wed, Sep 09, 2009 at 11:57:24PM -0700, MingyanGuo wrote: > > On Wed, Sep 9, 2009 at 11:26 PM, MingyanGuo guomingyan@gmail.com> wrote: > > > > > Hi all, > > > > > > I find that function pmap_remove_all for arch amd64 works with a time > > > window between reading & clearing the PTE flags(access flag and dirty > flag) > > > and invalidating its TLB entry on other CPU. After some discussion > with Li > > > Xin(cced), I think all the processes that are using the PTE being > removed > > > should be blocked before calling pmap_remove_all, or other CPU may > dirty the > > > page but does not set the dirty flag before the TLB entry is flushed. > But I > > > can not find how to block them to call the function. I read the > function > > > vm_pageout_scan in file vm/vm_pageout.c but can not find the exact > method it > > > used. Or I just misunderstood the semantics of function > pmap_remove_all ? > > > > > > Thanks in advance. > > > > > > Regards, > > > MingyanGuo > > > > > > > Sorry for the noise. I understand the logic now. There is no time window > > problem between reading & clearing the PTE and invalidating it on other > CPU, > > even if other CPU is using the PTE. I misunderstood the logic. > Hmm. What would happen for the following scenario. > Assume that the page m is mapped by vm map active on CPU1, and that > CPU1 has cached TLB entry for some writable mapping of this page, > but neither TLB entry not PTE has dirty bit set. > Then, assume that the following sequence of events occur: > CPU1: CPU2: > call pmap_remove_all(m) > clear pte > write to the address mapped > by m [*] > invalidate the TLB, > possibly making IPI to CPU1 > I assume that at the point marked [*], we can > - either loose the dirty bit, while CPU1 (atomically) sets the dirty bit > in the cleared pte. > Besides not properly tracking the modification status of the page, > it could also cause the page table page to be modified, that would > create non-zero page with PG_ZERO flag set. > - or CPU1 re-reads the PTE entry when setting the dirty bit, and generates > #pf since valid bit in PTE is zero. > Intel documentation mentions that dirty or accessed bits updates are done > with locked cycle, that definitely means that PTE is re-read, but I cannot > find whether valid bit is rechecked. I am not an architecture expert, but from a programmer's view, I *think* using the 'in memory' PTE structure for the first write to that PTE is more reasonable. To set the dirty bit, a CPU has to access memory with locked cycles, so using the 'in memory' PTE structure should add few performance burden but more friendly to software. However, it is just my guess, I am reading the manuals to find if any description about it. Regards, MingyanGuo From rysto32 at gmail.com Thu Sep 10 17:30:33 2009 From: rysto32 at gmail.com (Ryan Stone) Date: Thu Sep 10 17:30:40 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200908261642.59419.jhb@freebsd.org> <237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com> <200908271729.55213.jhb@freebsd.org> <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com> Message-ID: You should be able to run schedgraph.py on a windows machine with python installed. It works just fine for me on XP. From linda.messerschmidt at gmail.com Thu Sep 10 18:36:53 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Thu Sep 10 18:37:00 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200908261642.59419.jhb@freebsd.org> <237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com> <200908271729.55213.jhb@freebsd.org> <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com> Message-ID: <237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com> On Thu, Sep 10, 2009 at 12:57 PM, Ryan Stone wrote: > You should be able to run schedgraph.py on a windows machine with python > installed.? It works just fine for me on XP. Don't have any of those either, but I *did* get it working on a Mac right out of the box. Should have thought of that sooner. :) The output looks pretty straightforward, but there are a couple of things I find odd. First, there's a point right around what I estimate to be the problem time where schedgraph.py indicates gmond (the Ganglia monitor) was running uninterrupted for a period of exactly 1 second. However, it also indicates that both CPU's idle tasks were *also* running almost continuously during that time (subject to clock/net interrupts), and that the run queue on both CPU's was zero for most of that second while gmond was allegedly running. Second, the interval I graphed was about nine seconds. During that time, the PHP command line script made a whole lot of requests: it usleeps 50ms between requests, and non-broken requests average about 1.4ms. So even with the stalled request chopping 2 seconds off the end, there should be somewhere in the neighborhood of 130 requests during the graphed period. But that php process doesn't appear in the schedgraph output at all. So that doesn't make a whole lot of sense to me. I'll try to get another trace and see if that happens the same way again. From julian at elischer.org Thu Sep 10 18:46:47 2009 From: julian at elischer.org (Julian Elischer) Date: Thu Sep 10 18:46:54 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200908261642.59419.jhb@freebsd.org> <237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com> <200908271729.55213.jhb@freebsd.org> <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com> <237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com> Message-ID: <4AA94995.6030700@elischer.org> Linda Messerschmidt wrote: > On Thu, Sep 10, 2009 at 12:57 PM, Ryan Stone wrote: >> You should be able to run schedgraph.py on a windows machine with python >> installed. It works just fine for me on XP. > > Don't have any of those either, but I *did* get it working on a Mac > right out of the box. Should have thought of that sooner. :) > > The output looks pretty straightforward, but there are a couple of > things I find odd. > > First, there's a point right around what I estimate to be the problem > time where schedgraph.py indicates gmond (the Ganglia monitor) was > running uninterrupted for a period of exactly 1 second. However, it > also indicates that both CPU's idle tasks were *also* running almost > continuously during that time (subject to clock/net interrupts), and > that the run queue on both CPU's was zero for most of that second > while gmond was allegedly running. I've noticed that schedgraph tends to show the idle threads slightly skewed one way or the other. I think there is a cumulative rounding error in the way they are drawn due to the fact that they are run so often. Check the raw data and I think you will find that you just need to imagine the idle threads slightly to the left or right a bit. The longer the trace and the further to he right you are looking the more "out" the idle threads appear to be. I saw this on both Linux and Mac python implementations. > > Second, the interval I graphed was about nine seconds. During that > time, the PHP command line script made a whole lot of requests: it > usleeps 50ms between requests, and non-broken requests average about > 1.4ms. So even with the stalled request chopping 2 seconds off the > end, there should be somewhere in the neighborhood of 130 requests > during the graphed period. But that php process doesn't appear in the > schedgraph output at all. > > So that doesn't make a whole lot of sense to me. > > I'll try to get another trace and see if that happens the same way again. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From linda.messerschmidt at gmail.com Thu Sep 10 19:12:37 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Thu Sep 10 19:12:44 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <4AA94995.6030700@elischer.org> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200908261642.59419.jhb@freebsd.org> <237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com> <200908271729.55213.jhb@freebsd.org> <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com> <237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com> <4AA94995.6030700@elischer.org> Message-ID: <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com> On Thu, Sep 10, 2009 at 2:46 PM, Julian Elischer wrote: > I've noticed that schedgraph tends to show the idle threads slightly > skewed one way or the other. ?I think there is a cumulative rounding > error in the way they are drawn due to the fact that they are run so > often. ?Check the raw data and I think you will find that you just > need to imagine the idle threads slightly to the left or right a bit. No, there's no period anywhere in the trace where either idle thread didn't run for an entire second. I'm pretty sure schedgraph is throwing in some nonsense results. I did capture a second, larger, dataset after a 2.1s stall, and schedgraph includes an httpd process that supposedly spent 58 seconds on the run queue. I don't know if it's a dropped record or a parsing error or what. I do think on this second graph I can kind of see the *end* of the stall, because all of a sudden a ton of processes... everything from sshd to httpd to gmond to sh to vnlru to bufdaemon to fdc0... comes off of whatever it's waiting on and hits the run queue. The combined run queues for both processors spike up to 32 tasks at one point and then rapidly tail off as things return to normal. That pretty much matches the behavior shown by ktrace in my initial post, where everything goes to sleep on something-or-other in the kernel, and then at the end of the stall, everything wakes up at the same time. I think this means the problem is somehow related to locking, rather than scheduling. From linda.messerschmidt at gmail.com Fri Sep 11 01:34:32 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Fri Sep 11 01:34:39 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200908261642.59419.jhb@freebsd.org> <237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com> <200908271729.55213.jhb@freebsd.org> <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com> <237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com> <4AA94995.6030700@elischer.org> <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com> Message-ID: <237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com> Just to follow up, I've been doing some testing with masking for KTR_LOCK rather than KTR_SCHED. I'm having trouble with this because I have the KTR buffer size set to 1048576 entries, and with only KTR_LOCK enabled, this isn't enough for even a full second of tracing; the sample I'm working with now is just under 0.9s. It's an average of one entry every 2001 TSC ticks. That *seems* like a lot of locking activity, but some of the lock points are only a couple of lines apart, so maybe it's just incredibly verbose. Since it's so much data and I'm still working on a way to correlate it (lockgraph.py?), all I've got so far is a list of what trace points are coming up the most: 51927 src/sys/kern/kern_lock.c:215 (_lockmgr UNLOCK mtx_unlock() when flags & LK_INTERLOCK) 48033 src/sys/kern/vfs_subr.c:2284 (vdropl UNLOCK) 41548 src/sys/kern/vfs_subr.c:2187 (vput VI_LOCK) 29359 src/sys/kern/vfs_subr.c:2067 (vget VI_LOCK) 29358 src/sys/kern/vfs_subr.c:2079 (vget VI_UNLOCK) 23799 src/sys/nfsclient/nfs_subs.c:755 (nfs_getattrcache mtx_lock) 23460 src/sys/nfsclient/nfs_vnops.c:645 (nfs_getattr mtx_unlock) 23460 src/sys/nfsclient/nfs_vnops.c:642 (nfs_getattr mtx_lock) 23460 src/sys/nfsclient/nfs_subs.c:815 (nfs_getattrcache mtx_unlock) 23138 src/sys/kern/vfs_cache.c:345 (cache_lookup CACHE_LOCK) Unfortunately, it kind of sounds like I'm on my way to answering "why is this system slow?" even though it really isn't slow. (And I rush to point out that the Apache process in question doesn't at any point in its life touch NFS, though some of the other ones on the machine do.) In order to be the cause of my Apache problem, all this goobering around with NFS would have to be relatively infrequent but so intense that it shoves everything else out of the way. I'm skeptical, but I'm sure one of you guys can offer a more informed opinion. The only other thing I can think of is maybe all this is running me out of something I need (vnodes?) so everybody else blocks until it finishes and lets go of whatever finite resource it's using up? But that doesn't make a ton of sense either, because why would a lack of vnodes cause stalls in accept() or select() in unrelated processes? Not sure if I'm going in the right direction here or not. From jhb at freebsd.org Fri Sep 11 15:19:28 2009 From: jhb at freebsd.org (John Baldwin) Date: Fri Sep 11 15:19:34 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com> <237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com> Message-ID: <200909111102.14503.jhb@freebsd.org> On Thursday 10 September 2009 9:34:30 pm Linda Messerschmidt wrote: > Just to follow up, I've been doing some testing with masking for > KTR_LOCK rather than KTR_SCHED. > > I'm having trouble with this because I have the KTR buffer size set to > 1048576 entries, and with only KTR_LOCK enabled, this isn't enough for > even a full second of tracing; the sample I'm working with now is just > under 0.9s. It's an average of one entry every 2001 TSC ticks. That > *seems* like a lot of locking activity, but some of the lock points > are only a couple of lines apart, so maybe it's just incredibly > verbose. > > Since it's so much data and I'm still working on a way to correlate it > (lockgraph.py?), all I've got so far is a list of what trace points > are coming up the most: > > 51927 src/sys/kern/kern_lock.c:215 (_lockmgr UNLOCK mtx_unlock() when > flags & LK_INTERLOCK) > 48033 src/sys/kern/vfs_subr.c:2284 (vdropl UNLOCK) > 41548 src/sys/kern/vfs_subr.c:2187 (vput VI_LOCK) > 29359 src/sys/kern/vfs_subr.c:2067 (vget VI_LOCK) > 29358 src/sys/kern/vfs_subr.c:2079 (vget VI_UNLOCK) > 23799 src/sys/nfsclient/nfs_subs.c:755 (nfs_getattrcache mtx_lock) > 23460 src/sys/nfsclient/nfs_vnops.c:645 (nfs_getattr mtx_unlock) > 23460 src/sys/nfsclient/nfs_vnops.c:642 (nfs_getattr mtx_lock) > 23460 src/sys/nfsclient/nfs_subs.c:815 (nfs_getattrcache mtx_unlock) > 23138 src/sys/kern/vfs_cache.c:345 (cache_lookup CACHE_LOCK) > > Unfortunately, it kind of sounds like I'm on my way to answering "why > is this system slow?" even though it really isn't slow. (And I rush > to point out that the Apache process in question doesn't at any point > in its life touch NFS, though some of the other ones on the machine > do.) > > In order to be the cause of my Apache problem, all this goobering > around with NFS would have to be relatively infrequent but so intense > that it shoves everything else out of the way. I'm skeptical, but I'm > sure one of you guys can offer a more informed opinion. > > The only other thing I can think of is maybe all this is running me > out of something I need (vnodes?) so everybody else blocks until it > finishes and lets go of whatever finite resource it's using up? But > that doesn't make a ton of sense either, because why would a lack of > vnodes cause stalls in accept() or select() in unrelated processes? > > Not sure if I'm going in the right direction here or not. Try turning off KTR_LOCK for spin mutexes (just force LO_QUIET on in mtx_init() if MTX_SPIN is set) and use a schedgraph.py from the latest RELENG_7. It knows how to parse KTR_LOCK events and drop event "bars" for locks showing when they are held. A more recently schedgraph.py might also fix the bugs you were seeing with the idle threads looking too long (esp. at the start and end of graphs). -- John Baldwin From julian at elischer.org Fri Sep 11 15:35:17 2009 From: julian at elischer.org (Julian Elischer) Date: Fri Sep 11 15:35:24 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <200909111102.14503.jhb@freebsd.org> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com> <237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> Message-ID: <4AAA6E32.2080609@elischer.org> John Baldwin wrote: > > > A more recently schedgraph.py might also > fix the bugs you were seeing with the idle threads looking too long (esp. at > the start and end of graphs). not unless something has been fixed in the last week or so. From linda.messerschmidt at gmail.com Fri Sep 11 17:35:02 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Fri Sep 11 17:35:08 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <200909111102.14503.jhb@freebsd.org> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com> <237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> Message-ID: <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> On Fri, Sep 11, 2009 at 11:02 AM, John Baldwin wrote: > Try turning off KTR_LOCK for spin mutexes (just force LO_QUIET on in > mtx_init() if MTX_SPIN is set) I have *no* idea what you just said. :) Which is fine. But more to the point, I have no idea how to do it. :) > A more recently schedgraph.py might also > fix the bugs you were seeing with the idle threads looking too long (esp. at > the start and end of graphs). We are already on RELENG_7 due to the KTR-enabling rebuild, so that'd be the version we're using unless, as Julian observed, it's been fixed in the past week or so. Thanks! From jhb at freebsd.org Fri Sep 11 19:14:46 2009 From: jhb at freebsd.org (John Baldwin) Date: Fri Sep 11 19:14:53 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <4AAA6E32.2080609@elischer.org> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <4AAA6E32.2080609@elischer.org> Message-ID: <200909111300.37599.jhb@freebsd.org> On Friday 11 September 2009 11:35:14 am Julian Elischer wrote: > John Baldwin wrote: > > > > > > A more recently schedgraph.py might also > > fix the bugs you were seeing with the idle threads looking too long (esp. at > > the start and end of graphs). > > not unless something has been fixed in the last week or so. Well, I wasn't sure how old of a schedgraph.py is being used. 7.1 would have the bugs, but I think 7.2 should be fine. -- John Baldwin From jhb at freebsd.org Fri Sep 11 19:14:48 2009 From: jhb at freebsd.org (John Baldwin) Date: Fri Sep 11 19:15:04 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> Message-ID: <200909111506.47309.jhb@freebsd.org> On Friday 11 September 2009 1:35:00 pm Linda Messerschmidt wrote: > On Fri, Sep 11, 2009 at 11:02 AM, John Baldwin wrote: > > Try turning off KTR_LOCK for spin mutexes (just force LO_QUIET on in > > mtx_init() if MTX_SPIN is set) > > I have *no* idea what you just said. :) > > Which is fine. But more to the point, I have no idea how to do it. :) Something like this: Index: sys/kern/kern_mutex.c =================================================================== --- sys/kern/kern_mutex.c (.../mirror/FreeBSD/stable/7) (revision 195943) +++ sys/kern/kern_mutex.c (.../stable/7) (revision 195943) @@ -747,6 +747,10 @@ if (opts & MTX_NOPROFILE) flags |= LO_NOPROFILE; + /* XXX: Only log for regular mutexes. */ + if (opts & MTX_SPIN) + flags |= LO_QUIET; + /* Initialize mutex. */ m->mtx_lock = MTX_UNOWNED; m->mtx_recurse = 0; > > A more recently schedgraph.py might also > > fix the bugs you were seeing with the idle threads looking too long (esp. at > > the start and end of graphs). > > We are already on RELENG_7 due to the KTR-enabling rebuild, so that'd > be the version we're using unless, as Julian observed, it's been fixed > in the past week or so. Hmm. It works well for me for doing traces. -- John Baldwin From jilles at stack.nl Fri Sep 11 23:14:24 2009 From: jilles at stack.nl (Jilles Tjoelker) Date: Fri Sep 11 23:14:30 2009 Subject: Problem in bin/sh stripping the * character through ${expansion%} In-Reply-To: References: <4A7B1DB0.1040602@FreeBSD.org> Message-ID: <20090911231422.GA41683@stack.nl> On Fri, Aug 07, 2009 at 03:26:50AM +0400, Eygene Ryabinkin wrote: > Thu, Aug 06, 2009 at 11:15:12AM -0700, Doug Barton wrote: > > I came across this problem during a recent portmaster update. When > > trying to strip off the * character using variable expansion in bin/sh > > it doesn't work. Other "special" characters do work if they are > > properly escaped. > > The attached mini-script clearly shows the problem: > > $ sh sh-strip-problem > > var before stripping: foo\* > > var after stripping: foo\* > > var before stripping: foo\$ > > var after stripping: foo\ > According to the sh(1), it is not a problem. Namely, > - \* being unquoted at all will produce a lone '*'; > - '*' when treated as the smallest pattern, will result in a stripping > of a zero-length string -- it is the smallest pattern in the case of > '*' that matches anything. That is indeed an explanation why it works that way, but I think it is wrong. Generally, the shell command language avoids unnecessary levels of quoting. In the POSIX spec, "Shell Command Language", note the part about "${x#*}" (pattern) and ${x#"*"} (literal asterisk). Also compare with case $something in \*) echo asterisk;; esac which matches a literal asterisk. Two PRs already exist for aspects of stripping: bin/57554 (double quotes) and bin/117748 (trying to match pattern matching characters literally). > In order to strip the trailing star you should use > ----- > var=${var%[*]} > ----- > This gives you the pattern of '[*]' that is properly treated as the > single star -- it's a weird way to escape the star in the patterns. This is indeed a good workaround. -- Jilles Tjoelker From linda.messerschmidt at gmail.com Sat Sep 12 02:05:16 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Sat Sep 12 02:05:23 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <200909111506.47309.jhb@freebsd.org> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> Message-ID: <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> On Fri, Sep 11, 2009 at 3:06 PM, John Baldwin wrote: > Something like this: Ah, I understand now. :) Got up to 17 seconds of trace with that change. > Hmm. ?It works well for me for doing traces. It definitely works, it just always seems to have some-or-another weird artifact. But, with the lock info added, the locks that show big ugly gaping multi-second "lock acquire" bars are: unp_mtx and so_rcv_sx. I'm not 100% confident in this data yet, so I will try to get more data to confirm, but if that offers any clues about where to look, I'm all ears. I'm also a bit hazy on what the dark grey vs. light grey background is about. Thanks! From linda.messerschmidt at gmail.com Sat Sep 12 03:55:36 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Sat Sep 12 03:55:43 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> Message-ID: <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> OK, I have learned that ktrdump looks up the name of the process associated with a particular KSE at the the time of the dump, so if it's changed since tracing stopped, it will blissfully blame the wrong process. I understand why that's the case, but it still sucks for troubleshooting. :( This time, "pf task mtx" and "vnode_free_list" are the locks getting the blame. The processes fingered are an httpd ( (the root "parent" of the one doing the work, which does nothing but select() for 1s and wait to see if its children died), and vnlru. No correlation at all to the previous results, and this machine is now utterly quiescent except for the httpd process and the PHP exerciser. Hard to imagine vnlru has 1s worth of running to do on a machine with 949 total vnodes in use. A third run produced a 997ms "lock acquire" for "buffer daemon lock," a 497ms one for ip6qlock (no, there's no IPv6 in use on this machine), and an 8s (!!!) one on unp_mtx. bufdaemon had a 997s "running" bar, but according to the raw TSC values, that happened on the same CPU 1.999s *after* the 997ms buffer daemon lock acquire. I really don't know where to go from here. There's so little consistency that I'm just not sure if the data is bad, the tool is bad, the operator is bad, or there's some problem so fundamentally horrible that all I'm seeing is random side effects. From julian at elischer.org Sat Sep 12 04:06:15 2009 From: julian at elischer.org (Julian Elischer) Date: Sat Sep 12 04:06:22 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> Message-ID: <4AAB1E34.2060908@elischer.org> Linda Messerschmidt wrote: > OK, I have learned that ktrdump looks up the name of the process > associated with a particular KSE at the the time of the dump, so if > it's changed since tracing stopped, it will blissfully blame the wrong > process. I understand why that's the case, but it still sucks for > troubleshooting. :( > > This time, "pf task mtx" and "vnode_free_list" are the locks getting > the blame. The processes fingered are an httpd ( (the root "parent" > of the one doing the work, which does nothing but select() for 1s and > wait to see if its children died), and vnlru. No correlation at all > to the previous results, and this machine is now utterly quiescent > except for the httpd process and the PHP exerciser. Hard to imagine > vnlru has 1s worth of running to do on a machine with 949 total vnodes > in use. > > A third run produced a 997ms "lock acquire" for "buffer daemon lock," > a 497ms one for ip6qlock (no, there's no IPv6 in use on this machine), > and an 8s (!!!) one on unp_mtx. bufdaemon had a 997s "running" bar, > but according to the raw TSC values, that happened on the same CPU > 1.999s *after* the 997ms buffer daemon lock acquire. > > I really don't know where to go from here. There's so little > consistency that I'm just not sure if the data is bad, the tool is > bad, the operator is bad, or there's some problem so fundamentally > horrible that all I'm seeing is random side effects. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" does the system have a serial console? how about a normal console /keyboard? how often deos it hang? and for how long? is there a chance that you could notice when it is hung and hit and drop it into the debugger IN teh hung state? It is possible if you have a serial port to make a program that sends a char back and forth and when the machine hangs, sends teh magic sequence. (I think it's CR for serial debugger break, but I'm sure you can look up the kernel options and the chars in google.) if you can drop the machine into DDB (teh kernel debugger) in teh hung state, then there are lots of comands you can do to find out what is wrong. jhb actually gave a short talk that I videod and put on youtube on the topic. ps will show you what is actually running on which CPU and you an see what locks all the other processes are waiting on. then you can examine those locks and see who owns them. From linda.messerschmidt at gmail.com Sat Sep 12 04:47:33 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Sat Sep 12 04:47:39 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <4AAB1E34.2060908@elischer.org> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> <4AAB1E34.2060908@elischer.org> Message-ID: <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com> On Sat, Sep 12, 2009 at 12:06 AM, Julian Elischer wrote: > does the system have a serial console? how about a normal console /keyboard? It has an IP KVM. > how often deos it hang? and for ?how long? Well, this is interesting. I got really frustrated with the other approach, so I thought I'd thin a machine down absolutely as far as I could, eliminate every possible source of delay, and see what happens. I killed everything... cron, RPC, NFS, devd, gmon, nrpe, everything. The Apache and its exerciser are now the only things running on the machine, and the Apache is only touching an md0 swap device mounted on /mnt. I *still* get the hangs. It hangs for all sorts of different periods, but the duration of the stall is approximately inversely proportional to the chance of seeing it. To get a short delay, you need wait only a little bit. If you want a 2-3 second delay, you may have to wait 15-20 minutes. *However* in order to answer your question, I changed up the test program, which up til now has been cycling requests every 50 ms until it gets one >2s, at which point it sysctls to stop ktr and aborts. Now it prints the timestamp of all "too long" requests. But I also dropped the threshold for "too long" from 2s to 100ms, since with everything on RAM disk, there's no longer any reason to expect a request to take more than 1-2ms in the worst case. The results are pretty profound: 1252729876: request 82 131ms 1252729883: request 210 388ms 1252729890: request 338 380ms 1252729897: request 466 388ms 1252729904: request 594 404ms 1252729919: request 849 810ms 1252729926: request 977 386ms 1252729933: request 1105 370ms 1252729940: request 1233 366ms 1252729947: request 1361 400ms 1252729961: request 1617 746ms 1252729968: request 1744 477ms 1252729975: request 1872 388ms 1252729982: request 2000 380ms 1252729989: request 2128 384ms 1252729996: request 2256 395ms It goes on and on like this, I get a 380-400ms stall every seven seconds. I have had a few come back higher, in the 750-850ms range, usually after missing a beat: 1252729897: request 466 388ms 1252729904: request 594 404ms 1252729919: request 849 810ms 1252729926: request 977 386ms 1252730010: request 2512 416ms 1252730017: request 2640 390ms 1252730031: request 2896 774ms 1252730038: request 3023 431ms 1252730454: request 10568 378ms 1252730461: request 10696 397ms 1252730475: request 10952 733ms 1252730482: request 11080 366ms So far, nothing over 1s. So what happens every seven seconds?? From julian at elischer.org Sat Sep 12 05:47:15 2009 From: julian at elischer.org (Julian Elischer) Date: Sat Sep 12 05:47:22 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> <4AAB1E34.2060908@elischer.org> <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com> Message-ID: <4AAB35E0.3000908@elischer.org> Linda Messerschmidt wrote: > On Sat, Sep 12, 2009 at 12:06 AM, Julian Elischer wrote: >> does the system have a serial console? how about a normal console /keyboard? > > It has an IP KVM. > >> how often deos it hang? and for how long? > > Well, this is interesting. I got really frustrated with the other > approach, so I thought I'd thin a machine down absolutely as far as I > could, eliminate every possible source of delay, and see what happens. > I killed everything... cron, RPC, NFS, devd, gmon, nrpe, everything. > The Apache and its exerciser are now the only things running on the > machine, and the Apache is only touching an md0 swap device mounted on > /mnt. I *still* get the hangs. ok now we need to describe the hang.. if you can predictably get a hang every 7 seconds does this mean that it doesn't respond to keyboard for a moment every 7 seconds? or that it doesn't accept packets every 7 seconds? if you lean on the A key, do you see echo stop every 7 seconds for a moment? Or is it just the apache process that hangs? Does the watching process that you refer to below also hang? would it hang if it tried to access the disk? if the watching process is on the same machine, does it only trigger AFTER teh request has taken a ling time or could it time out with a select DURING the delayed response? (another way of asking "how hung is 'hung'?" > > It hangs for all sorts of different periods, but the duration of the > stall is approximately inversely proportional to the chance of seeing > it. To get a short delay, you need wait only a little bit. If you > want a 2-3 second delay, you may have to wait 15-20 minutes. > > *However* in order to answer your question, I changed up the test > program, which up til now has been cycling requests every 50 ms until > it gets one >2s, at which point it sysctls to stop ktr and aborts. > > Now it prints the timestamp of all "too long" requests. But I also > dropped the threshold for "too long" from 2s to 100ms, since with > everything on RAM disk, there's no longer any reason to expect a > request to take more than 1-2ms in the worst case. > > The results are pretty profound: > > 1252729876: request 82 131ms > 1252729883: request 210 388ms > 1252729890: request 338 380ms > 1252729897: request 466 388ms > 1252729904: request 594 404ms > 1252729919: request 849 810ms > 1252729926: request 977 386ms > 1252729933: request 1105 370ms > 1252729940: request 1233 366ms > 1252729947: request 1361 400ms > 1252729961: request 1617 746ms > 1252729968: request 1744 477ms > 1252729975: request 1872 388ms > 1252729982: request 2000 380ms > 1252729989: request 2128 384ms > 1252729996: request 2256 395ms > > It goes on and on like this, I get a 380-400ms stall every seven > seconds. I have had a few come back higher, in the 750-850ms range, > usually after missing a beat: > > 1252729897: request 466 388ms > 1252729904: request 594 404ms > 1252729919: request 849 810ms > 1252729926: request 977 386ms > > 1252730010: request 2512 416ms > 1252730017: request 2640 390ms > 1252730031: request 2896 774ms > 1252730038: request 3023 431ms > > 1252730454: request 10568 378ms > 1252730461: request 10696 397ms > 1252730475: request 10952 733ms > 1252730482: request 11080 366ms > > So far, nothing over 1s. > > So what happens every seven seconds?? From linda.messerschmidt at gmail.com Sat Sep 12 06:52:52 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Sat Sep 12 06:52:59 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <4AAB35E0.3000908@elischer.org> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> <4AAB1E34.2060908@elischer.org> <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com> <4AAB35E0.3000908@elischer.org> Message-ID: <237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com> On Sat, Sep 12, 2009 at 1:47 AM, Julian Elischer wrote: > ok now we need to describe the hang.. ?if you can predictably get a hang > every 7 seconds does this mean that it doesn't respond to keyboard for a > moment every 7 seconds? It's possible. > or that it doesn't accept packets every 7 seconds? It appears that it accepts & responds to at least pings; I was able to do an every-0.1-seconds ping through a bevy of 300-1900ms stalls with: 2323 packets transmitted, 2323 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.120/1.019/5.979/0.288 ms As best as I could tell, schedgraph also showed that the clock interrupt and the em0 interrupt always got serviced on time. Pretty much seems like its userspace that's getting put on hold. > Or is it just the apache process that hangs? This is where I started from. In the original post (way long ago now), I described how pretty much every process on the system went into the kernel for something and stalled there, and then when the stall ends, they all unblock at once. I posted some examples via ktrace that I sadly no longer have the source data for. > Does the watching process that you refer to below also hang? I don't think I can say for sure. I observe visual stalls from time to time in the output if I have it show every request where there is no stall shown, which could either indicate that a stall occurred outside the request or that my shoddy Internet connection has 100ms latency and consistent 1% packet loss, which it does. I did write a short C program that just select()s on stdin for 100ms over and over and aborts if it takes more than 125ms to go through the loop; it never aborts, even through 1s+ stalls and the loop times it reports are consistently 110ms regardless of what else is going on, which I don't think is unexpected. However, I'm not sure why that differs from the behavior of the "master" Apache processes, which select() for 1 second all day long, but do appear to be affected. Maybe because they are selecting a network socket instead of a tty? I don't know. Also, if I disable NTP, the system does not appear to lose time during the stalls, which fits with the consistent clock interrupts I saw. > would it hang if it tried to access the disk? By using the md device, I believe I have removed the disk from the equation; neither process is accessing it. Even without doing that, if I leave iostat -w 1 running alongside the test, there's no correlation between the tiny amount of disk activity there is and observed stalls. > if the watching process is on the same machine, does it only trigger AFTER > teh request has taken a ling time or could it time out with a select DURING > the delayed response? (another way of asking "how hung > is 'hung'?" It's just a PHP script using libcurl to request the file. I only moved it to the same machine in order to have it be able to write the sysctl to stop the KTR traces I was doing. If you're asking could the check script be modified to time out after, say, 1 second, and if so, would it return during the hang or after it? I don't know. My guess based on the earlier ktrace output is that it would time out, but not return until the hang ended. I'll see if I the curl lib exposes a configurable timeout and try it. From linda.messerschmidt at gmail.com Sat Sep 12 06:55:09 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Sat Sep 12 06:55:15 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> <4AAB1E34.2060908@elischer.org> <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com> <4AAB35E0.3000908@elischer.org> <237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com> Message-ID: <237c27100909112355xbf1354djfe0b562195546bca@mail.gmail.com> On Sat, Sep 12, 2009 at 2:52 AM, Linda Messerschmidt wrote: > On Sat, Sep 12, 2009 at 1:47 AM, Julian Elischer wrote: >> ok now we need to describe the hang.. ?if you can predictably get a hang >> every 7 seconds does this mean that it doesn't respond to keyboard for a >> moment every 7 seconds? > > It's possible. Oops, I meant to explain that my ISP connection and personal sense of time are probably not good enough to say one way or the other for sure. I do see stalls, but I can't say whether they are the same stall or just a dropped packet somewhere along the way. From linda.messerschmidt at gmail.com Sat Sep 12 07:52:24 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Sat Sep 12 07:52:29 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> <4AAB1E34.2060908@elischer.org> <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com> <4AAB35E0.3000908@elischer.org> <237c27100909112352k5504357dge725c8f905ee650a@mail.gmail.com> Message-ID: <237c27100909120052k1db7e029xcf36e075865d29d8@mail.gmail.com> OK, first, I figured out the seven second thing. I actually had already found that particular issue earlier in the troubleshooting process, but forgot all about it when I pulled in a second machine to test with. It was simply a case of setting Apache's MaxRequestsPerChild to a very low value (128) in combination with only allowing 1 access at a time. 128 requests * (50ms sleep + 2ms request + overhead) ~= 7s. So that was just noise masking the real problem, which is less frequent and less predictable. Sorry for the red herring. :( On Sat, Sep 12, 2009 at 2:52 AM, Linda Messerschmidt wrote: > If you're asking could the check script be modified to time out after, > say, 1 second, and if so, would it return during the hang or after it? > ?I don't know. ?My guess based on the earlier ktrace output is that it > would time out, but not return until the hang ended. ?I'll see if I > the curl lib exposes a configurable timeout and try it. This proved to be quite easy to do. I ran the script twice, once with the timeout and once without. Without timeout: 1252741492: request 910 101ms 1252741567: request 2133 1429ms 1252741603: request 2722 146ms With 1s timeout: 1252741492: request 1078 106ms 1252741567: request 2302 1010ms (<--- Timeout) 1252741567: request 2303 273ms (<--- after 50ms sleep, goes back to end of stall) 1252741603: request 2892 136ms As you can see, the two scripts experience stalls in pretty much lockstep, but the script itself does not appear affected, so it's just on the Apache side. From des at des.no Sat Sep 12 13:54:01 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Sat Sep 12 13:54:07 2009 Subject: DDB capture buffer Message-ID: <86fxasl154.fsf@ds4.des.no> The default maximum size of the DDB capture buffer is 48 kB. This is ridiculously low; it's not even nearly enough to capture the output from the first example in textdump(4): script kdb.enter.panic=textdump set; capture on; show allpcpu; bt; ps; alltrace; show alllocks; call doadump; reset Would anyone object to increasing it to 1 MB? DDB is opt-in, so it will only affect people who compile it into their kernel (or -CURRENT users who don't compile it out; they have it coming). DES -- Dag-Erling Sm?rgrav - des@des.no From gary.jennejohn at freenet.de Sun Sep 13 08:43:00 2009 From: gary.jennejohn at freenet.de (Gary Jennejohn) Date: Sun Sep 13 08:43:06 2009 Subject: DDB capture buffer In-Reply-To: <86fxasl154.fsf@ds4.des.no> References: <86fxasl154.fsf@ds4.des.no> Message-ID: <20090913104257.7e042f45@ernst.jennejohn.org> On Sat, 12 Sep 2009 15:53:59 +0200 Dag-Erling Sm__rgrav wrote: > The default maximum size of the DDB capture buffer is 48 kB. This is > ridiculously low; it's not even nearly enough to capture the output from > the first example in textdump(4): > > script kdb.enter.panic=textdump set; capture on; show allpcpu; bt; > ps; alltrace; show alllocks; call doadump; reset > > Would anyone object to increasing it to 1 MB? DDB is opt-in, so it will > only affect people who compile it into their kernel (or -CURRENT users > who don't compile it out; they have it coming). > I'd say it's a good idea, as long as you put a warning in UPDATING for people using e.g. embedded devices with little memory. It's reasonable to expect such users to customize their kernel configs. --- Gary Jennejohn From ken at mthelicon.com Sun Sep 13 17:38:05 2009 From: ken at mthelicon.com (Pegasus Mc Cleaft) Date: Sun Sep 13 17:38:13 2009 Subject: Changes in IPv6 Configuration Message-ID: <200909131837.56319.ken@mthelicon.com> Hello Current and Hackers, With the recent changes to /etc/rc.d for network start-up. I was wondering what is now correct. The previously working ipv6 configuration no longer creates a static default route, and I have not been able to figure out why. After boot, if I manually add the default route for ipv6, all works OK but I must be missing something to make it happen automatically. Currently, I have this in my /etc/rc.conf and this does not work. Any help would be appreciated. ipv6_prefer="YES" ifconfig_re0_ipv6="inet6 2001:4d48:ad51:32:21d:7dff:fe07:241a prefixlen 64" ipv6_defaultrouter="2001:4d48:ad51:32::3" ipv6_network_interfaces="auto" ipv6_default_interface="re0" Thanks in advance, Peg From ken at mthelicon.com Sun Sep 13 18:14:05 2009 From: ken at mthelicon.com (Pegasus Mc Cleaft) Date: Sun Sep 13 18:14:18 2009 Subject: Changes in IPv6 Configuration In-Reply-To: <20090913175656.K68375@maildrop.int.zabbadoz.net> References: <200909131837.56319.ken@mthelicon.com> <20090913175656.K68375@maildrop.int.zabbadoz.net> Message-ID: <200909131913.53853.ken@mthelicon.com> On Sunday 13 September 2009 18:58:02 Bjoern A. Zeeb wrote: > On Sun, 13 Sep 2009, Pegasus Mc Cleaft wrote: > > Hi, > > > With the recent changes to /etc/rc.d for network start-up. I was > > wondering what is now correct. The previously working ipv6 configuration > > no longer creates a static default route, and I have not been able to > > figure out why. After boot, if I manually add the default route for ipv6, > > all works OK but I must be missing something to make it happen > > automatically. Currently, I have this in my /etc/rc.conf and this does > > not work. Any help would be appreciated. > > > > ipv6_prefer="YES" > > ifconfig_re0_ipv6="inet6 2001:4d48:ad51:32:21d:7dff:fe07:241a prefixlen > > 64" ipv6_defaultrouter="2001:4d48:ad51:32::3" > > ipv6_network_interfaces="auto" > > ipv6_default_interface="re0" > > can you try this change (just pasted in): > > Index: etc/rc.d/routing > =================================================================== > --- etc/rc.d/routing (revision 197153) > +++ etc/rc.d/routing (working copy) > @@ -132,7 +132,7 @@ > if [ -n "${ipv6_static_routes}" ]; then > for i in ${ipv6_static_routes}; do > ipv6_route_args=`get_if_var $i ipv6_route_IF` > - route ${_action} -inet6 ${route_args} > + route ${_action} -inet6 ${ipv6_route_args} > done > fi > > > > /bz > Hi Bjoern, Thank you very much. That change did work and now the IPv6 default gateway is being added to the route table on start-up. Cheers, Peg From bzeeb-lists at lists.zabbadoz.net Sun Sep 13 18:19:17 2009 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Sun Sep 13 18:19:24 2009 Subject: Changes in IPv6 Configuration In-Reply-To: <200909131837.56319.ken@mthelicon.com> References: <200909131837.56319.ken@mthelicon.com> Message-ID: <20090913175656.K68375@maildrop.int.zabbadoz.net> On Sun, 13 Sep 2009, Pegasus Mc Cleaft wrote: Hi, > With the recent changes to /etc/rc.d for network start-up. I was wondering > what is now correct. The previously working ipv6 configuration no longer > creates a static default route, and I have not been able to figure out why. > After boot, if I manually add the default route for ipv6, all works OK but I > must be missing something to make it happen automatically. Currently, I have > this in my /etc/rc.conf and this does not work. Any help would be appreciated. > > ipv6_prefer="YES" > ifconfig_re0_ipv6="inet6 2001:4d48:ad51:32:21d:7dff:fe07:241a prefixlen 64" > ipv6_defaultrouter="2001:4d48:ad51:32::3" > ipv6_network_interfaces="auto" > ipv6_default_interface="re0" can you try this change (just pasted in): Index: etc/rc.d/routing =================================================================== --- etc/rc.d/routing (revision 197153) +++ etc/rc.d/routing (working copy) @@ -132,7 +132,7 @@ if [ -n "${ipv6_static_routes}" ]; then for i in ${ipv6_static_routes}; do ipv6_route_args=`get_if_var $i ipv6_route_IF` - route ${_action} -inet6 ${route_args} + route ${_action} -inet6 ${ipv6_route_args} done fi /bz -- Bjoern A. Zeeb What was I talking about and who are you again? From bzeeb-lists at lists.zabbadoz.net Sun Sep 13 20:20:14 2009 From: bzeeb-lists at lists.zabbadoz.net (Bjoern A. Zeeb) Date: Sun Sep 13 20:20:26 2009 Subject: Changes in IPv6 Configuration In-Reply-To: <200909131913.53853.ken@mthelicon.com> References: <200909131837.56319.ken@mthelicon.com> <20090913175656.K68375@maildrop.int.zabbadoz.net> <200909131913.53853.ken@mthelicon.com> Message-ID: <20090913201914.A68375@maildrop.int.zabbadoz.net> On Sun, 13 Sep 2009, Pegasus Mc Cleaft wrote: Hi, > On Sunday 13 September 2009 18:58:02 Bjoern A. Zeeb wrote: >> On Sun, 13 Sep 2009, Pegasus Mc Cleaft wrote: >> >> Hi, >> >>> With the recent changes to /etc/rc.d for network start-up. I was >>> wondering what is now correct. The previously working ipv6 configuration >>> no longer creates a static default route, and I have not been able to >>> figure out why. After boot, if I manually add the default route for ipv6, >>> all works OK but I must be missing something to make it happen >>> automatically. Currently, I have this in my /etc/rc.conf and this does >>> not work. Any help would be appreciated. >>> >>> ipv6_prefer="YES" >>> ifconfig_re0_ipv6="inet6 2001:4d48:ad51:32:21d:7dff:fe07:241a prefixlen >>> 64" ipv6_defaultrouter="2001:4d48:ad51:32::3" >>> ipv6_network_interfaces="auto" >>> ipv6_default_interface="re0" >> >> can you try this change (just pasted in): >> >> Index: etc/rc.d/routing >> =================================================================== >> --- etc/rc.d/routing (revision 197153) >> +++ etc/rc.d/routing (working copy) >> @@ -132,7 +132,7 @@ >> if [ -n "${ipv6_static_routes}" ]; then >> for i in ${ipv6_static_routes}; do >> ipv6_route_args=`get_if_var $i ipv6_route_IF` >> - route ${_action} -inet6 ${route_args} >> + route ${_action} -inet6 ${ipv6_route_args} >> done >> fi >> >> >> >> /bz >> > > Thank you very much. That change did work and now the IPv6 default gateway > is being added to the route table on start-up. Thanks a lot for reporting and testing. I just comitted the correction. /bz -- Bjoern A. Zeeb What was I talking about and who are you again? From fabio at tranchitella.it Mon Sep 14 09:31:46 2009 From: fabio at tranchitella.it (Fabio Tranchitella) Date: Mon Sep 14 11:32:41 2009 Subject: Distro Summit 2010: Call for Papers Message-ID: <20090914091641.GE23252@mail.26dimensions.com> =============== CALL FOR PAPERS =============== Distro Summit 2010 is a one-day technical conference with a strong focus on collaboration between Free Software distributions hosted at the linux.conf.au 2010. We are looking for proposals from any Free Software distribution, from the typical full distributions (both linux and non-linux) to the niche market derivatives. In spite of the strong focus on collaboration between Free Software distributions, topics may include packaging, maintenance, relationship with upstream developers, release management and QA. For more informtion, please visit: http://distrosummit.org. Important dates =============== * Call for papers ends: Wednesday 30 September 2009 * Announcing the schedule: Friday 2 October 2009 * Conference begins: Monday 18 January 2010 Presentation types ================== We will accept proposals for: * 25 minute standard-length presentations; * 50 minute long presentations. Session lengths include time for audience questions. We intend for standard-length presentation to make up the vast majority of our presentations. If you plan on submitting a proposal for a long presentation, a willingness to present a standard-length presentation will impact positively on your proposal. Submit a proposal ================= To submit your proposal, we'll need the following information: * Your name, contact details and a short biography; * Your proposal title; * Intended audience; * An abstract; * Presentation outline; * Presentation type (standard-length or long). To submit a proposal, or get more information, please write to cfp@distrosummit.org. About the Distro Summit ======================= The Distro Summit 2010 is a one-day developer conference with a strong focus on collaboration between free software distributions hosted at the linux.conf.au 2010 (http://www.lca2010.org.nz). In addition to a schedule of technical, social and policy talks, the Distro Summit provides an opportunity for developers, contributors and other interested people to meet in person and work together more closely. Previous similar events have featured speakers from around the world. They have also been extremely beneficial for developing key free software software components and for improving collaboration and sharing between the different distributions. Target Audience =============== The Distro Summit is (mainly) a technical event, but this does not mean that the only target audience are developers and maintainers of free software distributions: the event will feature talks that range from the development to real-world use cases, going through marketing and the social aspects of the maintenance of free software distributions. -- Fabio Tranchitella on the behalf of the Distro Summit organizers From ghelmer at palisadesys.com Mon Sep 14 13:28:26 2009 From: ghelmer at palisadesys.com (Guy Helmer) Date: Mon Sep 14 13:28:32 2009 Subject: Intermittent system hangs on 7.2-RELEASE-p1 In-Reply-To: <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> <4AAB1E34.2060908@elischer.org> <237c27100909112147h64f71585p2a97f2b48a510985@mail.gmail.com> Message-ID: <4AAE41DB.9050104@palisadesys.com> Linda Messerschmidt wrote: > Well, this is interesting. I got really frustrated with the other > approach, so I thought I'd thin a machine down absolutely as far as I > could, eliminate every possible source of delay, and see what happens. > I killed everything... cron, RPC, NFS, devd, gmon, nrpe, everything. > The Apache and its exerciser are now the only things running on the > machine, and the Apache is only touching an md0 swap device mounted on > /mnt. I *still* get the hangs. > > It hangs for all sorts of different periods, but the duration of the > stall is approximately inversely proportional to the chance of seeing > it. To get a short delay, you need wait only a little bit. If you > want a 2-3 second delay, you may have to wait 15-20 minutes. > On what sort of hardware is this hang occurring? Several months ago I was trying to resolve an intermittent hang under FreeBSD 7. I collected a large number of crashdumps I created using the kernel debugger when I caught the machine hanging, but the backtraces were very inconsistent, and the hang was only occurring on Xeons with multithreading (older 2.8GHz and 3.6GHz Xeons). I was able to prevent the hang by setting "mach.hyperthreading_enabled=0" in /boot/loader.conf, but I am still not sure why it worked. Guy From Alexander at Leidinger.net Tue Sep 15 09:08:18 2009 From: Alexander at Leidinger.net (Alexander Leidinger) Date: Tue Sep 15 11:31:59 2009 Subject: Buffer overflow detected by REDZONE with linuxulator In-Reply-To: References: Message-ID: <20090915110806.13816i8eowbecwkc@webmail.leidinger.net> Quoting Alexander Best (from Wed, 09 Sep 2009 19:01:31 +0200 (CEST)): > hi there, CCing emulation@, this is better suited there. Full quote for the benefit of the emulation@ readers. Please drop hackers@ on reply. Thanks. > i've installed emulators/linux_dist-gentoo-stage3 and grabbed a snapshot from > the ltp git repository (http://ltp.sourceforge.net/). as expected some tests > failed because i'm using compat.linux.osrelease: 2.6.16 which is > still missing > a few linux syscalls, ipcs and ioctls. Are you interested to help update the corresponding FreeBSD wiki page? If yes, register there and we can hand out write access. > however i also noticed REDZONE reporting buffer overflows. i'm only > a user and > not a developer so i don't know if the ltp is to be blamed or if the problem > lies within the linuxulator. Probably the later... > i'm running 9.0-CURRENT (r196879). as i mentioned before i'm using 2.6 linux > kernel emulation. here are the buffer overflow reports: Is your system running in 32bit or 64bit mode? Do you know which ltp-tests cause those messages to appear? Bye, Alexander. > Sep 9 14:12:42 otaku kernel: REDZONE: Buffer overflow detected. 9 bytes > corrupted after 0xcc28c483 (3 bytes allocated). > Sep 9 14:12:42 otaku kernel: Allocation backtrace: > Sep 9 14:12:42 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a > Sep 9 14:12:42 otaku kernel: #1 0xc05bc673 at malloc+0x1c3 > Sep 9 14:12:42 otaku kernel: #2 0xc07428b8 at linux_getsockaddr+0x48 > Sep 9 14:12:42 otaku kernel: #3 0xc0742eb8 at linux_socketcall+0x178 > Sep 9 14:12:42 otaku kernel: #4 0xc0772f56 at syscall+0x2a6 > Sep 9 14:12:42 otaku kernel: #5 0xc07568b0 at Xint0x80_syscall+0x20 > Sep 9 14:12:42 otaku kernel: Free backtrace: > Sep 9 14:12:42 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a > Sep 9 14:12:42 otaku kernel: #1 0xc05bc32d at free+0x5d > Sep 9 14:12:42 otaku kernel: #2 0xc0742ef0 at linux_socketcall+0x1b0 > Sep 9 14:12:42 otaku kernel: #3 0xc0772f56 at syscall+0x2a6 > Sep 9 14:12:42 otaku kernel: #4 0xc07568b0 at Xint0x80_syscall+0x20 > Sep 9 14:20:08 otaku kernel: REDZONE: Buffer overflow detected. 4 bytes > corrupted after 0xcc2538ea (106 bytes allocated). > Sep 9 14:20:08 otaku kernel: Allocation backtrace: > Sep 9 14:20:08 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a > Sep 9 14:20:08 otaku kernel: #1 0xc05bc673 at malloc+0x1c3 > Sep 9 14:20:08 otaku kernel: #2 0xc063a902 at unp_connect+0x162 > Sep 9 14:20:08 otaku kernel: #3 0xc063d6c9 at uipc_connect+0x49 > Sep 9 14:20:08 otaku kernel: #4 0xc062fde2 at soconnect+0x52 > Sep 9 14:20:08 otaku kernel: #5 0xc0638eb6 at kern_connect+0x96 > Sep 9 14:20:08 otaku kernel: #6 0xc0742c7b at linux_connect+0x3b > Sep 9 14:20:08 otaku kernel: #7 0xc0742f22 at linux_socketcall+0x1e2 > Sep 9 14:20:08 otaku kernel: #8 0xc0772f56 at syscall+0x2a6 > Sep 9 14:20:08 otaku kernel: #9 0xc07568b0 at Xint0x80_syscall+0x20 > Sep 9 14:20:08 otaku kernel: Free backtrace: > Sep 9 14:20:08 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a > Sep 9 14:20:08 otaku kernel: #1 0xc05bc32d at free+0x5d > Sep 9 14:20:08 otaku kernel: #2 0xc063bfb2 at uipc_detach+0x242 > Sep 9 14:20:08 otaku kernel: #3 0xc0632a7e at sofree+0x22e > Sep 9 14:20:08 otaku kernel: #4 0xc0632f26 at soclose+0x386 > Sep 9 14:20:08 otaku kernel: #5 0xc0617c49 at soo_close+0x29 > Sep 9 14:20:08 otaku kernel: #6 0xc0598b13 at _fdrop+0x43 > Sep 9 14:20:08 otaku kernel: #7 0xc059ab90 at closef+0x290 > Sep 9 14:20:08 otaku kernel: #8 0xc059af22 at kern_close+0x102 > Sep 9 14:20:08 otaku kernel: #9 0xc059b09a at close+0x1a > Sep 9 14:20:08 otaku kernel: #10 0xc0772f56 at syscall+0x2a6 > Sep 9 14:20:08 otaku kernel: #11 0xc07568b0 at Xint0x80_syscall+0x20 > Sep 9 14:20:09 otaku kernel: REDZONE: Buffer overflow detected. 4 bytes > corrupted after 0xccc653ea (106 bytes allocated). > Sep 9 14:20:09 otaku kernel: Allocation backtrace: > Sep 9 14:20:09 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a > Sep 9 14:20:09 otaku kernel: #1 0xc05bc673 at malloc+0x1c3 > Sep 9 14:20:09 otaku kernel: #2 0xc063a902 at unp_connect+0x162 > Sep 9 14:20:09 otaku kernel: #3 0xc063d6c9 at uipc_connect+0x49 > Sep 9 14:20:09 otaku kernel: #4 0xc062fde2 at soconnect+0x52 > Sep 9 14:20:09 otaku kernel: #5 0xc0638eb6 at kern_connect+0x96 > Sep 9 14:20:09 otaku kernel: #6 0xc0742c7b at linux_connect+0x3b > Sep 9 14:20:09 otaku kernel: #7 0xc0742f22 at linux_socketcall+0x1e2 > Sep 9 14:20:09 otaku kernel: #8 0xc0772f56 at syscall+0x2a6 > Sep 9 14:20:09 otaku kernel: #9 0xc07568b0 at Xint0x80_syscall+0x20 > Sep 9 14:20:09 otaku kernel: Free backtrace: > Sep 9 14:20:09 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a > Sep 9 14:20:09 otaku kernel: #1 0xc05bc32d at free+0x5d > Sep 9 14:20:09 otaku kernel: #2 0xc063bfb2 at uipc_detach+0x242 > Sep 9 14:20:09 otaku kernel: #3 0xc0632a7e at sofree+0x22e > Sep 9 14:20:09 otaku kernel: #4 0xc0632f26 at soclose+0x386 > Sep 9 14:20:09 otaku kernel: #5 0xc0617c49 at soo_close+0x29 > Sep 9 14:20:09 otaku kernel: #6 0xc0598b13 at _fdrop+0x43 > Sep 9 14:20:09 otaku kernel: #7 0xc059ab90 at closef+0x290 > Sep 9 14:20:09 otaku kernel: #8 0xc059af22 at kern_close+0x102 > Sep 9 14:20:09 otaku kernel: #9 0xc059b09a at close+0x1a > Sep 9 14:20:09 otaku kernel: #10 0xc0772f56 at syscall+0x2a6 > Sep 9 14:20:09 otaku kernel: #11 0xc07568b0 at Xint0x80_syscall+0x20 > Sep 9 14:20:09 otaku kernel: REDZONE: Buffer overflow detected. 4 bytes > corrupted after 0xcf45a9ea (106 bytes allocated). > Sep 9 14:20:09 otaku kernel: Allocation backtrace: > Sep 9 14:20:09 otaku kernel: #0 0xc0709aaa at redzone_setup+0x3a > Sep 9 14:20:09 otaku kernel: #1 0xc05bc673 at malloc+0x1c3 > Sep 9 14:20:09 otaku kernel: #2 0xc063a902 at unp_connect+0x162 > Sep 9 14:20:09 otaku kernel: #3 0xc063d6c9 at uipc_connect+0x49 > Sep 9 14:20:09 otaku kernel: #4 0xc062fde2 at soconnect+0x52 > Sep 9 14:20:09 otaku kernel: #5 0xc0638eb6 at kern_connect+0x96 > Sep 9 14:20:09 otaku kernel: #6 0xc0742c7b at linux_connect+0x3b > Sep 9 14:20:09 otaku kernel: #7 0xc0742f22 at linux_socketcall+0x1e2 > Sep 9 14:20:09 otaku kernel: #8 0xc0772f56 at syscall+0x2a6 > Sep 9 14:20:09 otaku kernel: #9 0xc07568b0 at Xint0x80_syscall+0x20 > Sep 9 14:20:09 otaku kernel: Free backtrace: > Sep 9 14:20:09 otaku kernel: #0 0xc0709a3a at redzone_check+0x17a > Sep 9 14:20:09 otaku kernel: #1 0xc05bc32d at free+0x5d > Sep 9 14:20:09 otaku kernel: #2 0xc063bfb2 at uipc_detach+0x242 > Sep 9 14:20:09 otaku kernel: #3 0xc0632a7e at sofree+0x22e > Sep 9 14:20:09 otaku kernel: #4 0xc0632f26 at soclose+0x386 > Sep 9 14:20:09 otaku kernel: #5 0xc0617c49 at soo_close+0x29 > Sep 9 14:20:09 otaku kernel: #6 0xc0598b13 at _fdrop+0x43 > Sep 9 14:20:09 otaku kernel: #7 0xc059ab90 at closef+0x290 > Sep 9 14:20:09 otaku kernel: #8 0xc059b55a at fdfree+0x3ea > Sep 9 14:20:09 otaku kernel: #9 0xc05a57b3 at exit1+0x513 > Sep 9 14:20:09 otaku kernel: #10 0xc05d17f4 at sigexit+0xa14 > Sep 9 14:20:09 otaku kernel: #11 0xc05d19fd at postsig+0x1dd > Sep 9 14:20:09 otaku kernel: #12 0xc0608fca at ast+0x35a > Sep 9 14:20:09 otaku kernel: #13 0xc0757174 at doreti_ast+0x17 > > cheers. > alex -- Fifth Law of Procrastination: Procrastination avoids boredom; one never has the feeling that there is nothing important to do. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From to.my.trociny at gmail.com Tue Sep 15 20:15:08 2009 From: to.my.trociny at gmail.com (Mikolaj Golub) Date: Tue Sep 15 20:15:41 2009 Subject: 7.1 panicked removing namecache entry from cache Message-ID: <868wgg2ce3.fsf@kopusha.onet> Hi, Today we had vfs related panic on 7.1-RELEASE-p5. Does anybody have any idea? Might it have already been fixed in later versions? Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor write, page not present instruction pointer = 0x20:0xc07fd34b stack pointer = 0x28:0xe6a97bc8 frame pointer = 0x28:0xe6a97bdc code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 49 (vnlru) trap number = 12 panic: page fault cpuid = 2 Uptime: 11h19m41s Physical memory: 3059 MB Dumping 275 MB: 260 244 228 212 196 180 164 148 132 116 100 84 68 52 36 20 4 (kgdb) where #0 doadump () at pcpu.h:196 #1 0xc07910a7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0xc0791379 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0xc0aa7bcc in trap_fatal (frame=0xe6a97b88, eva=0) at /usr/src/sys/i386/i386/trap.c:939 #4 0xc0aa7e50 in trap_pfault (frame=0xe6a97b88, usermode=0, eva=0) at /usr/src/sys/i386/i386/trap.c:852 #5 0xc0aa880c in trap (frame=0xe6a97b88) at /usr/src/sys/i386/i386/trap.c:530 #6 0xc0a8e67b in calltrap () at /usr/src/sys/i386/i386/exception.s:159 #7 0xc07fd34b in cache_zap (ncp=0xcc33783c) at /usr/src/sys/kern/vfs_cache.c:276 #8 0xc07fd57c in cache_purge (vp=0xc890533c) at /usr/src/sys/kern/vfs_cache.c:613 #9 0xc080df18 in vgonel (vp=0xc890533c) at /usr/src/sys/kern/vfs_subr.c:2545 #10 0xc081174d in vnlru_free (count=270) at /usr/src/sys/kern/vfs_subr.c:870 #11 0xc0811ddc in vnlru_proc () at /usr/src/sys/kern/vfs_subr.c:733 #12 0xc076cc19 in fork_exit (callout=0xc0811cf0 , arg=0x0, frame=0xe6a97d38) at /usr/src/sys/kern/kern_fork.c:804 #13 0xc0a8e6f0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:264 267 static void 268 cache_zap(ncp) 269 struct namecache *ncp; 270 { 271 struct vnode *vp; 272 273 mtx_assert(&cache_lock, MA_OWNED); 274 CTR2(KTR_VFS, "cache_zap(%p) vp %p", ncp, ncp->nc_vp); 275 vp = NULL; 276 LIST_REMOVE(ncp, nc_hash); 277 LIST_REMOVE(ncp, nc_src); 278 if (LIST_EMPTY(&ncp->nc_dvp->v_cache_src)) { 279 vp = ncp->nc_dvp; 280 numcachehv--; 281 } 282 if (ncp->nc_vp) { 283 TAILQ_REMOVE(&ncp->nc_vp->v_cache_dst, ncp, nc_dst); 284 ncp->nc_vp->v_dd = NULL; 285 } else { 286 TAILQ_REMOVE(&ncneg, ncp, nc_dst); 287 numneg--; 288 } 289 numcache--; 290 cache_free(ncp); 291 if (vp) 292 vdrop(vp); 293 } 603 void 604 cache_purge(vp) 605 struct vnode *vp; 606 { 607 608 CTR1(KTR_VFS, "cache_purge(%p)", vp); 609 CACHE_LOCK(); 610 while (!LIST_EMPTY(&vp->v_cache_src)) 611 cache_zap(LIST_FIRST(&vp->v_cache_src)); 612 while (!TAILQ_EMPTY(&vp->v_cache_dst)) 613 cache_zap(TAILQ_FIRST(&vp->v_cache_dst)); 614 vp->v_dd = NULL; 615 CACHE_UNLOCK(); 616 } (kgdb) fr 8 #8 0xc07fd57c in cache_purge (vp=0xc890533c) at /usr/src/sys/kern/vfs_cache.c:613 613 struct namecache *ncp, *nnp; (kgdb) p *vp $1 = {v_type = VREG, v_tag = 0xc0b37e38 "ufs", v_op = 0xc0bfbdc0, v_data = 0x0, v_mount = 0x0, v_nmntvnodes = {tqe_next = 0xcc342450, tqe_prev = 0xcc31e350}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0, vu_yield = 0}, v_hashlist = {le_next = 0x0, le_prev = 0xc65e99fc}, v_hash = 6643674, v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first = 0xcc33783c, tqh_last = 0xcc33784c}, v_dd = 0x0, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_lock = { lk_object = {lo_name = 0xc0b37e38 "ufs", lo_type = 0xc0b37e38 "ufs", lo_flags = 70844416, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0c42d10, lk_flags = 262208, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 51, lk_lockholder = 0xc64bf8c0, lk_newlock = 0x0}, v_interlock = {lock_object = { lo_name = 0xc0b418b9 "vnode interlock", lo_type = 0xc0b418b9 "vnode interlock", lo_flags = 16973824, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, v_vnlock = 0xc8905394, v_holdcnt = 1, v_usecount = 0, v_iflag = 128, v_vflag = 0, v_writecount = 0, v_freelist = {tqe_next = 0xcc30acf0, tqe_prev = 0xc0c4f3ac}, v_bufobj = { bo_mtx = 0xc89053c4, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xc8905400}, bv_root = 0x0, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xc8905410}, bv_root = 0x0, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_ops = 0xc0be8e40, bo_bsize = 16384, bo_object = 0x0, bo_synclist = {le_next = 0x0, le_prev = 0xc890520c}, bo_private = 0xc890533c, __bo_vnode = 0xc890533c}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0} (kgdb) fr 7 #7 0xc07fd34b in cache_zap (ncp=0xcc33783c) at /usr/src/sys/kern/vfs_cache.c:276 276 LIST_REMOVE(ncp, nc_hash); (kgdb) p *ncp $2 = {nc_hash = {le_next = 0x0, le_prev = 0x0}, nc_src = {le_next = 0xcc31650c, le_prev = 0xcc2c51e4}, nc_dst = {tqe_next = 0x0, tqe_prev = 0x0}, nc_dvp = 0x0, nc_vp = 0x0, nc_flag = 0 '\0', nc_nlen = 0 '\0', nc_name = 0xcc33785e ""} -- Mikolaj Golub From auryn at zirakzigil.org Tue Sep 15 21:11:32 2009 From: auryn at zirakzigil.org (Giulio Ferro) Date: Tue Sep 15 21:11:39 2009 Subject: ZFS group ownership Message-ID: <4AAFFEBB.4030907@zirakzigil.org> I don't know if this is the correct list to discuss this matter, if not I apologize in advance. I've always understood group ownership as a way to allow members of the same group to operate on files / folders which belong to that group, while leaving out others. Let's suppose to have a directory /root/test (UFS file system) I do this: cd /root chmod -R 770 test chown -R www:www test (I use group www as an example, since it's already present on a base system) My user "gferro" also belongs to group www and has umask 007 su - gferro touch qweq mkdir asda If I watch now the file and directory I've just created: --------------------------------------------------------------- %ls -la total 6 drwxrwx--- 3 www www 512 Sep 12 13:39 . drwxr-xr-x 4 root wheel 512 Sep 12 13:02 .. drwxrwx--- 2 gferro www 512 Sep 12 13:39 asda -rw-rw---- 1 gferro www 0 Sep 12 13:38 qweq --------------------------------------------------------------- I see that both belongs to group www, even though gferro's base group is "gferro": --------------------------------------------------------------- id gferro uid=1001(gferro) gid=1001(gferro) groups=1001(gferro),80(www) --------------------------------------------------------------- This means that all those user's who belong to group "www" will be able to work with the files and directories I've created. Now I try to do the same on a zfs partition on the same machine This is what I see with ls --------------------------------------------------------------- ls -la total 4 drwxrwx--- 3 www www 4 Sep 12 13:43 . drwxr-xr-x 4 root wheel 4 Sep 12 13:43 .. drwxrwx--- 2 gferro gferro 2 Sep 12 13:43 asda -rw-rw---- 1 gferro gferro 0 Sep 12 13:43 qweq --------------------------------------------------------------- As you can see, both file and directory belongs now to "gferro" and not "www". This means that other users won't even be able to read my files / dir, let alone modify them. What I ask now is: is this a bug or a feature? How can I achieve my goal in ZFS, that is allowing members of the same group to operate with the files / dirs they create? Thanks in advance. From auryn at zirakzigil.org Tue Sep 15 21:16:31 2009 From: auryn at zirakzigil.org (Giulio Ferro) Date: Tue Sep 15 21:17:05 2009 Subject: ZFS group ownership Message-ID: <4AAFFF9B.80304@zirakzigil.org> I don't know if this is the correct list to discuss this matter, if not I apologize in advance. I've always understood group ownership as a way to allow members of the same group to operate on files / folders which belong to that group, while leaving out others. Let's suppose to have a directory /root/test (UFS file system) I do this: cd /root chmod -R 770 test chown -R www:www test (I use group www as an example, since it's already present on a base system) My user "gferro" also belongs to group www and has umask 007 su - gferro touch qweq mkdir asda If I watch now the file and directory I've just created: --------------------------------------------------------------- %ls -la total 6 drwxrwx--- 3 www www 512 Sep 12 13:39 . drwxr-xr-x 4 root wheel 512 Sep 12 13:02 .. drwxrwx--- 2 gferro www 512 Sep 12 13:39 asda -rw-rw---- 1 gferro www 0 Sep 12 13:38 qweq --------------------------------------------------------------- I see that both belongs to group www, even though gferro's base group is "gferro": --------------------------------------------------------------- id gferro uid=1001(gferro) gid=1001(gferro) groups=1001(gferro),80(www) --------------------------------------------------------------- This means that all those user's who belong to group "www" will be able to work with the files and directories I've created. Now I try to do the same on a zfs partition on the same machine This is what I see with ls --------------------------------------------------------------- ls -la total 4 drwxrwx--- 3 www www 4 Sep 12 13:43 . drwxr-xr-x 4 root wheel 4 Sep 12 13:43 .. drwxrwx--- 2 gferro gferro 2 Sep 12 13:43 asda -rw-rw---- 1 gferro gferro 0 Sep 12 13:43 qweq --------------------------------------------------------------- As you can see, both file and directory belongs now to "gferro" and not "www". This means that other users won't even be able to read my files / dir, let alone modify them. What I ask now is: is this a bug or a feature? How can I achieve my goal in ZFS, that is allowing members of the same group to operate with the files / dirs they create? Thanks in advance. From auryn at zirakzigil.org Tue Sep 15 21:41:32 2009 From: auryn at zirakzigil.org (Giulio Ferro) Date: Tue Sep 15 21:41:43 2009 Subject: ZFS group ownership Message-ID: <4AAB8AD0.5010302@zirakzigil.org> I don't know if this is the correct list to discuss this matter, if not I apologize in advance. I've always understood group ownership as a way to allow members of the same group to operate on files / folders which belong to that group, while leaving out others. Let's suppose to have a directory /root/test (UFS file system) I do this: cd /root chmod -R 770 test chown -R www:www test (I use group www as an example, since it's already present on a base system) My user "gferro" also belongs to group www and has umask 007 su - gferro touch qweq mkdir asda If I watch now the file and directory I've just created: --------------------------------------------------------------- %ls -la total 6 drwxrwx--- 3 www www 512 Sep 12 13:39 . drwxr-xr-x 4 root wheel 512 Sep 12 13:02 .. drwxrwx--- 2 gferro www 512 Sep 12 13:39 asda -rw-rw---- 1 gferro www 0 Sep 12 13:38 qweq --------------------------------------------------------------- I see that both belongs to group www, even though gferro's base group is "gferro": --------------------------------------------------------------- id gferro uid=1001(gferro) gid=1001(gferro) groups=1001(gferro),80(www) --------------------------------------------------------------- This means that all those user's who belong to group "www" will be able to work with the files and directories I've created. Now I try to do the same on a zfs partition on the same machine This is what I see with ls --------------------------------------------------------------- ls -la total 4 drwxrwx--- 3 www www 4 Sep 12 13:43 . drwxr-xr-x 4 root wheel 4 Sep 12 13:43 .. drwxrwx--- 2 gferro gferro 2 Sep 12 13:43 asda -rw-rw---- 1 gferro gferro 0 Sep 12 13:43 qweq --------------------------------------------------------------- As you can see, both file and directory belongs now to "gferro" and not "www". This means that other users won't even be able to read my files / dir, let alone modify them. What I ask now is: is this a bug or a feature? How can I achieve my goal in ZFS, that is allowing members of the same group to operate with the files / dirs they create? Thanks in advance. From ben at b1c1l1.com Tue Sep 15 22:37:00 2009 From: ben at b1c1l1.com (Benjamin Lee) Date: Tue Sep 15 22:37:07 2009 Subject: ZFS group ownership In-Reply-To: <4AAB8AD0.5010302@zirakzigil.org> References: <4AAB8AD0.5010302@zirakzigil.org> Message-ID: <4AB01700.8060602@b1c1l1.com> On 09/12/2009 04:49 AM, Giulio Ferro wrote: [...] > How can I achieve my goal in ZFS, that is allowing members of the same > group to operate with the files / dirs they create? Does setting the setgid bit on the directory have any effect? -- Benjamin Lee http://www.b1c1l1.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 899 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090915/58ed258f/signature.pgp From nate at thatsmathematics.com Tue Sep 15 22:32:17 2009 From: nate at thatsmathematics.com (Nate Eldredge) Date: Tue Sep 15 22:41:44 2009 Subject: ZFS group ownership In-Reply-To: <4AAB8AD0.5010302@zirakzigil.org> References: <4AAB8AD0.5010302@zirakzigil.org> Message-ID: On Sat, 12 Sep 2009, Giulio Ferro wrote: > I don't know if this is the correct list to discuss this matter, if not > I apologize in advance. freebsd-questions might have been better, but I don't think you're too far off. It wasn't necessary to post three times though :) [On UFS, files are created with the same group as the directory that contains them. On ZFS, they are created with the primary group of the user who creates them.] > What I ask now is: is this a bug or a feature? Both, I think :) The behavior you describe on UFS (group comes from the directory) is standard for BSD-based systems like FreeBSD. On SysV-based systems, however, the default is that the group comes from the user, as you describe on ZFS. ZFS was originally developed for Solaris, a descendent of SysV, so it's not surprising that it also has this behavior. However, this is at least a documentation bug, since the open(2) man page describes the BSD behavior without mentioning exceptions. > How can I achieve my goal in ZFS, that is allowing members of the same > group to operate with the files / dirs they create? On SysV, you can get BSD-type behavior by setting the sgid bit on the directory in question, e.g. "chmod g+s dir". Then new files will inherit their group from the directory. I suspect this will work on FreeBSD/ZFS too even though "chmod g+s" on a directory is undocumented. -- Nate Eldredge nate@thatsmathematics.com From ady at freebsd.ady.ro Wed Sep 16 10:37:19 2009 From: ady at freebsd.ady.ro (Adrian Penisoara) Date: Wed Sep 16 10:37:26 2009 Subject: ZFS group ownership In-Reply-To: References: <4AAB8AD0.5010302@zirakzigil.org> Message-ID: <78cb3d3f0909160336m2d1f93dsad4aafb692395a80@mail.gmail.com> Hi, On Wed, Sep 16, 2009 at 12:18 AM, Nate Eldredge wrote: [...] > [On UFS, files are created with the same group as the directory that > contains them. ?On ZFS, they are created with the primary group of the user > who creates them.] > >> What I ask now is: is this a bug or a feature? > > Both, I think :) > > The behavior you describe on UFS (group comes from the directory) is > standard for BSD-based systems like FreeBSD. ?On SysV-based systems, > however, the default is that the group comes from the user, as you describe > on ZFS. ?ZFS was originally developed for Solaris, a descendent of SysV, so > it's not surprising that it also has this behavior. ?However, this is at > least a documentation bug, since the open(2) man page describes the BSD > behavior without mentioning exceptions. Is the ownership of the new file decided by the open() syscall or by the filesystem layer ? On a superficial lookup through the sources it appears a filesystem layer choice... Which of the following would then be the best option (also taking POLA into account): * leave things are they are * make ZFS under FreeBSD behave the way open(2) describes * have a new ZFS property govern the behavior and default to one of the above Thanks, Adrian Penisoara EnterpriseBSD From romain at blogreen.org Wed Sep 16 11:12:38 2009 From: romain at blogreen.org (Romain =?iso-8859-1?Q?Tarti=E8re?=) Date: Wed Sep 16 11:12:46 2009 Subject: ZFS group ownership In-Reply-To: References: <4AAB8AD0.5010302@zirakzigil.org> Message-ID: <20090916111237.GB1700@blogreen.org> On Tue, Sep 15, 2009 at 03:18:41PM -0700, Nate Eldredge wrote: > >What I ask now is: is this a bug or a feature? > > Both, I think :) Or none, just different implementation of the same open() function complying with the Open Group Base Specifications ;-) Quotting http://www.opengroup.org/onlinepubs/009695399/functions/open.html ----------------8<------------------------------------------------------ O_CREAT [...] the file shall be created; the user ID of the file shall be set to the effective user ID of the process; the group ID of the file shall be set to the group ID of the file's parent directory or to the effective group ID of the process [...] Implementations shall provide a way to initialize the file's group ID to the group ID of the parent directory. Implementations may, but need not, provide an implementation-defined way to initialize the file's group ID to the effective group ID of the calling process. [...] The POSIX.1-1990 standard required that the group ID of a newly created file be set to the group ID of its parent directory or to the effective group ID of the creating process. FIPS 151-2 required that implementations provide a way to have the group ID be set to the group ID of the containing directory, but did not prohibit implementations also supporting a way to set the group ID to the effective group ID of the creating process. Conforming applications should not assume which group ID will be used. If it matters, an application can use chown() to set the group ID after the file is created, or determine under what conditions the implementation will set the desired group ID. ----------------8<------------------------------------------------------ This being said, two different behaviour on the same system, even if you ? should not assume which group ID will be used ?, is kind of weird. -- Romain Tarti?re http://romain.blogreen.org/ pgp: 8DAB A124 0DA4 7024 F82A E748 D8E9 A33F FF56 FF43 (ID: 0xFF56FF43) (plain text =non-HTML= PGP/GPG encrypted/signed e-mail much appreciated) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090916/304b3ec0/attachment.pgp From hch at infradead.org Wed Sep 16 13:00:53 2009 From: hch at infradead.org (Christoph Hellwig) Date: Wed Sep 16 13:01:00 2009 Subject: ZFS group ownership In-Reply-To: <78cb3d3f0909160336m2d1f93dsad4aafb692395a80@mail.gmail.com> References: <4AAB8AD0.5010302@zirakzigil.org> <78cb3d3f0909160336m2d1f93dsad4aafb692395a80@mail.gmail.com> Message-ID: <20090916130044.GA2670@infradead.org> On Wed, Sep 16, 2009 at 12:36:57PM +0200, Adrian Penisoara wrote: > Which of the following would then be the best option (also taking POLA > into account): > * leave things are they are > * make ZFS under FreeBSD behave the way open(2) describes > * have a new ZFS property govern the behavior and default to one of the above Btw, on Linux all the common filesystem support the SysV behaviour by default but have a mount option bsdgroups/grpid that turns on the BSD hebaviour. I would recommend you do the same just with reversed signs on FreeBSD. ??Having different default behaviour for different filesystems on a single OS is generally a bad idea. From linda.messerschmidt at gmail.com Wed Sep 16 13:52:49 2009 From: linda.messerschmidt at gmail.com (Linda Messerschmidt) Date: Wed Sep 16 13:52:56 2009 Subject: ZFS group ownership In-Reply-To: <20090916130044.GA2670@infradead.org> References: <4AAB8AD0.5010302@zirakzigil.org> <78cb3d3f0909160336m2d1f93dsad4aafb692395a80@mail.gmail.com> <20090916130044.GA2670@infradead.org> Message-ID: <237c27100909160652u4bb141fcl6f29385ea9bad03e@mail.gmail.com> On Wed, Sep 16, 2009 at 9:00 AM, Christoph Hellwig wrote: > Btw, on Linux all the common filesystem support the SysV behaviour > by default but have a mount option bsdgroups/grpid that turns on the BSD > hebaviour. ?I would recommend you do the same just with reversed signs > on FreeBSD. ???Having different default behaviour for different > filesystems on a single OS is generally a bad idea. I agree; I have noticed a lot of confusion with this as well. In our case, we mount some ZFS and UFS2 filesystems over NFS, and the NFS client machine has no way of knowing what the NFS server is going to use for a default group. It would be fantastic if there were a way to get consistent behavior. However, some of the ZFS filesystems in question are exported from a Solaris machine, and on Solaris, I believe it's the NFS client that's expected to set the grpid flag, so in order to reliably help with this case, this might have to be a client-side NFS flag on FreeBSD as well. Otherwise it may wind up working differently for local ZFS filesystems versus ones mounted over NFS. From auryn at zirakzigil.org Wed Sep 16 15:15:50 2009 From: auryn at zirakzigil.org (Giulio Ferro) Date: Wed Sep 16 15:15:58 2009 Subject: ZFS group ownership In-Reply-To: <78cb3d3f0909160336m2d1f93dsad4aafb692395a80@mail.gmail.com> References: <4AAB8AD0.5010302@zirakzigil.org> <78cb3d3f0909160336m2d1f93dsad4aafb692395a80@mail.gmail.com> Message-ID: <4AB0FC8A.3090604@zirakzigil.org> Adrian Penisoara wrote: > Is the ownership of the new file decided by the open() syscall or by > the filesystem layer ? > On a superficial lookup through the sources it appears a filesystem > layer choice... > > Which of the following would then be the best option (also taking POLA > into account): > * leave things are they are > * make ZFS under FreeBSD behave the way open(2) describes > * have a new ZFS property govern the behavior and default to one of the above > > Thanks, > Adrian Penisoara > EnterpriseBSD > Thanks all for answering (sorry for the multiple posts, I was tuning my mail server) I believe that on a same freebsd there should be a consistent behavior among different mounts. So in my opinion ZFS should conform to UFS (or UFS to ZFS, if that's desirable). The best thing would be to have a sysctl tunable to choose that (sysv5 / bsd). BSD should be default, since it makes more sense for workgroups... From alfred at freebsd.org Thu Sep 17 02:00:58 2009 From: alfred at freebsd.org (Alfred Perlstein) Date: Thu Sep 17 02:01:06 2009 Subject: script(1) issue/question Message-ID: <20090917020058.GD21946@elvis.mu.org> [[ peter cc'd cause he seemed to add the original "exec a non-shell option" to script(1) ]] Hello all, I noticed that when running "script" and passing a program to exec that ^Z does not seem to work (although ^C does). I'm trying to figure a workaround and what I was going to do was add ISIG to the term flags when spawning a non-shell utility. (should I also check /etc/shells to help preserve POLA further?) Any pointers on this? Would this be a good idea, or a bad idea? Terminal gurus give me a hand please! :) please ignore the sigflg part at the top for now, prepping for possible cli option to avoid POLA breakage. Is there a way to detect ^Z or other terminal signals and propogate them to the child in a better way? -- - Alfred Perlstein .- AMA, VMOA #5191, 03 vmax, 92 gs500, 85 ch250 .- FreeBSD committer -------------- next part -------------- A non-text attachment was scrubbed... Name: script1.diff Type: text/x-diff Size: 845 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090917/82e219aa/script1.bin From is at rambler-co.ru Thu Sep 17 10:33:33 2009 From: is at rambler-co.ru (Igor Sysoev) Date: Thu Sep 17 10:33:40 2009 Subject: fcntl(F_RDAHEAD) Message-ID: <20090917101526.GF57619@rambler-co.ru> Hi, nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single byte. The first aio_read() preloads the first 128K part of a file in VM cache, however, all successive aio_read()s preload just 16K parts of the file. This makes non-blocking sendfile() usage ineffective for files larger than 128K. I've created a small patch for Darwin compatible F_RDAHEAD fcntl: fcntl(fd, F_RDAHEAD, preload_size) There is small incompatibilty: Darwin's fcntl allows just to enable/disable read ahead, while the proposed patch allows to set exact preload size. Currently the preload size affects vn_read() code path only and does not affect on sendfile() code path. However, it can be easy extended on sendfile() part too. The preload size is still limited by sysctl vfs.read_max. The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. -- Igor Sysoev http://sysoev.ru/en/ -------------- next part -------------- --- sys/sys/fcntl.h 2009-06-02 19:05:17.000000000 +0400 +++ sys/sys/fcntl.h 2009-09-12 20:29:34.000000000 +0400 @@ -118,6 +118,10 @@ #if __BSD_VISIBLE /* Attempt to bypass buffer cache */ #define O_DIRECT 0x00010000 +#ifdef _KERNEL +/* Read ahead */ +#define O_RDAHEAD 0x00020000 +#endif #endif /* @@ -187,6 +191,7 @@ #define F_SETLK 12 /* set record locking information */ #define F_SETLKW 13 /* F_SETLK; wait if blocked */ #define F_SETLK_REMOTE 14 /* debugging support for remote locks */ +#define F_RDAHEAD 15 /* read ahead */ /* file descriptor flags (F_GETFD, F_SETFD) */ #define FD_CLOEXEC 1 /* close-on-exec flag */ --- sys/kern/vfs_vnops.c 2009-06-02 19:05:00.000000000 +0400 +++ sys/kern/vfs_vnops.c 2009-09-12 20:24:00.000000000 +0400 @@ -305,6 +305,9 @@ sequential_heuristic(struct uio *uio, struct file *fp) { + if (fp->f_flag & O_RDAHEAD) + return(fp->f_seqcount << IO_SEQSHIFT); + if ((uio->uio_offset == 0 && fp->f_seqcount > 0) || uio->uio_offset == fp->f_nextoff) { /* --- sys/kern/kern_descrip.c 2009-08-28 18:50:11.000000000 +0400 +++ sys/kern/kern_descrip.c 2009-09-12 20:23:36.000000000 +0400 @@ -411,6 +411,7 @@ u_int newmin; int error, flg, tmp; int vfslocked; + uint64_t bsize; vfslocked = 0; error = 0; @@ -694,6 +695,31 @@ vfslocked = 0; fdrop(fp, td); break; + + case F_RDAHEAD: + FILEDESC_SLOCK(fdp); + if ((fp = fdtofp(fd, fdp)) == NULL) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + if (fp->f_type != DTYPE_VNODE) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + FILE_LOCK(fp); + if (arg) { + bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize; + fp->f_seqcount = (arg + bsize - 1) / bsize; + fp->f_flag |= O_RDAHEAD; + } else { + fp->f_flag &= ~O_RDAHEAD; + } + FILE_UNLOCK(fp); + FILEDESC_SUNLOCK(fdp); + break; + default: error = EINVAL; break; From auryn at zirakzigil.org Thu Sep 17 14:49:44 2009 From: auryn at zirakzigil.org (Giulio Ferro) Date: Thu Sep 17 14:49:53 2009 Subject: ZFS group ownership In-Reply-To: References: <4AAB8AD0.5010302@zirakzigil.org> Message-ID: <4AB24C80.2020308@zirakzigil.org> Nate Eldredge wrote: > On SysV, you can get BSD-type behavior by setting the sgid bit on the > directory in question, e.g. "chmod g+s dir". Then new files will > inherit their group from the directory. I suspect this will work on > FreeBSD/ZFS too even though "chmod g+s" on a directory is undocumented. > Yes, it does. Thanks, I'll use this for my needs. Giulio. From delphij at delphij.net Thu Sep 17 22:26:53 2009 From: delphij at delphij.net (Xin LI) Date: Thu Sep 17 22:27:19 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <20090917101526.GF57619@rambler-co.ru> References: <20090917101526.GF57619@rambler-co.ru> Message-ID: <4AB2B7A1.5000601@delphij.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, Igor, Igor Sysoev wrote: > Hi, > > nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO > flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single > byte. The first aio_read() preloads the first 128K part of a file in VM cache, > however, all successive aio_read()s preload just 16K parts of the file. > This makes non-blocking sendfile() usage ineffective for files larger > than 128K. > > I've created a small patch for Darwin compatible F_RDAHEAD fcntl: > > fcntl(fd, F_RDAHEAD, preload_size) > > There is small incompatibilty: Darwin's fcntl allows just to enable/disable > read ahead, while the proposed patch allows to set exact preload size. > > Currently the preload size affects vn_read() code path only and does not > affect on sendfile() code path. However, it can be easy extended on > sendfile() part too. The preload size is still limited by sysctl vfs.read_max. > > The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. I have ported this as a patch against -HEAD (should apply on 8.0-R but it's too late for us to add a new feature) plus a manual page entry documenting the feature. I've used F_READAHEAD as the name, but reading the manual page, it looks like we can just use F_RDAHEAD since Darwin seems to just distinguish 0 and !=0 case so that programmers won't have to use #ifdef or something else to get code working on different platform? Cheers, - -- Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (FreeBSD) iEYEARECAAYFAkqyt40ACgkQi+vbBBjt66AdKgCfXOo/Vn+zw0cCjS+gGJUgPo8t WToAmgKIXaVKsKUcqVOqTwHl4eTFsbkM =uP3m -----END PGP SIGNATURE----- -------------- next part -------------- Index: lib/libc/sys/fcntl.2 =================================================================== --- lib/libc/sys/fcntl.2 (revision 197297) +++ lib/libc/sys/fcntl.2 (working copy) @@ -28,7 +28,7 @@ .\" @(#)fcntl.2 8.2 (Berkeley) 1/12/94 .\" $FreeBSD$ .\" -.Dd March 8, 2008 +.Dd September 19, 2009 .Dt FCNTL 2 .Os .Sh NAME @@ -241,6 +241,14 @@ .Dv SA_RESTART (see .Xr sigaction 2 ) . +.It Dv F_READAHEAD +Set or clear the read ahead amount for sequential access to the third +argument, +.Fa arg , +which is rounded up to the nearest block size. +A zero value in +.Fa arg +turns off read ahead. .El .Pp When a shared lock has been set on a segment of a file, Index: sys/kern/kern_descrip.c =================================================================== --- sys/kern/kern_descrip.c (revision 197297) +++ sys/kern/kern_descrip.c (working copy) @@ -421,6 +421,7 @@ struct vnode *vp; int error, flg, tmp; int vfslocked; + uint64_t bsize; vfslocked = 0; error = 0; @@ -686,6 +687,31 @@ vfslocked = 0; fdrop(fp, td); break; + + case F_READAHEAD: + FILEDESC_SLOCK(fdp); + if ((fp = fdtofp(fd, fdp)) == NULL) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + if (fp->f_type != DTYPE_VNODE) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + fhold(fp); + FILEDESC_SUNLOCK(fdp); + if (arg) { + bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize; + fp->f_seqcount = (arg + bsize - 1) / bsize; + fp->f_flag |= O_READAHEAD; + } else { + fp->f_flag &= ~O_READAHEAD; + } + fdrop(fp, td); + break; + default: error = EINVAL; break; Index: sys/kern/vfs_vnops.c =================================================================== --- sys/kern/vfs_vnops.c (revision 197297) +++ sys/kern/vfs_vnops.c (working copy) @@ -312,6 +312,9 @@ sequential_heuristic(struct uio *uio, struct file *fp) { + if (fp->f_flag & O_READAHEAD) + return (fp->f_seqcount << IO_SEQSHIFT); + /* * Offset 0 is handled specially. open() sets f_seqcount to 1 so * that the first I/O is normally considered to be slightly Index: sys/sys/fcntl.h =================================================================== --- sys/sys/fcntl.h (revision 197297) +++ sys/sys/fcntl.h (working copy) @@ -112,7 +112,11 @@ #if __BSD_VISIBLE /* Attempt to bypass buffer cache */ #define O_DIRECT 0x00010000 +#ifdef _KERNEL +/* Read ahead */ +#define O_READAHEAD 0x00020000 #endif +#endif /* Defined by POSIX Extended API Set Part 2 */ #if __BSD_VISIBLE @@ -218,6 +222,7 @@ #define F_SETLK 12 /* set record locking information */ #define F_SETLKW 13 /* F_SETLK; wait if blocked */ #define F_SETLK_REMOTE 14 /* debugging support for remote locks */ +#define F_READAHEAD 15 /* read ahead */ /* file descriptor flags (F_GETFD, F_SETFD) */ #define FD_CLOEXEC 1 /* close-on-exec flag */ From alfred at freebsd.org Thu Sep 17 22:50:36 2009 From: alfred at freebsd.org (Alfred Perlstein) Date: Thu Sep 17 22:50:44 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <4AB2B7A1.5000601@delphij.net> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> Message-ID: <20090917225036.GL21946@elvis.mu.org> Please do not make the option have the same name but different semantics. Strongly suggest adding the Darwin name as a toggle and a FreeBSD name as a specific size option. -Alfred * Xin LI [090917 15:27] wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, Igor, > > Igor Sysoev wrote: > > Hi, > > > > nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO > > flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single > > byte. The first aio_read() preloads the first 128K part of a file in VM cache, > > however, all successive aio_read()s preload just 16K parts of the file. > > This makes non-blocking sendfile() usage ineffective for files larger > > than 128K. > > > > I've created a small patch for Darwin compatible F_RDAHEAD fcntl: > > > > fcntl(fd, F_RDAHEAD, preload_size) > > > > There is small incompatibilty: Darwin's fcntl allows just to enable/disable > > read ahead, while the proposed patch allows to set exact preload size. > > > > Currently the preload size affects vn_read() code path only and does not > > affect on sendfile() code path. However, it can be easy extended on > > sendfile() part too. The preload size is still limited by sysctl vfs.read_max. > > > > The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. > > I have ported this as a patch against -HEAD (should apply on 8.0-R but > it's too late for us to add a new feature) plus a manual page entry > documenting the feature. > > I've used F_READAHEAD as the name, but reading the manual page, it looks > like we can just use F_RDAHEAD since Darwin seems to just distinguish 0 > and !=0 case so that programmers won't have to use #ifdef or something > else to get code working on different platform? > > Cheers, > - -- > Xin LI http://www.delphij.net/ > FreeBSD - The Power to Serve! > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.12 (FreeBSD) > > iEYEARECAAYFAkqyt40ACgkQi+vbBBjt66AdKgCfXOo/Vn+zw0cCjS+gGJUgPo8t > WToAmgKIXaVKsKUcqVOqTwHl4eTFsbkM > =uP3m > -----END PGP SIGNATURE----- > Index: lib/libc/sys/fcntl.2 > =================================================================== > --- lib/libc/sys/fcntl.2 (revision 197297) > +++ lib/libc/sys/fcntl.2 (working copy) > @@ -28,7 +28,7 @@ > .\" @(#)fcntl.2 8.2 (Berkeley) 1/12/94 > .\" $FreeBSD$ > .\" > -.Dd March 8, 2008 > +.Dd September 19, 2009 > .Dt FCNTL 2 > .Os > .Sh NAME > @@ -241,6 +241,14 @@ > .Dv SA_RESTART > (see > .Xr sigaction 2 ) . > +.It Dv F_READAHEAD > +Set or clear the read ahead amount for sequential access to the third > +argument, > +.Fa arg , > +which is rounded up to the nearest block size. > +A zero value in > +.Fa arg > +turns off read ahead. > .El > .Pp > When a shared lock has been set on a segment of a file, > Index: sys/kern/kern_descrip.c > =================================================================== > --- sys/kern/kern_descrip.c (revision 197297) > +++ sys/kern/kern_descrip.c (working copy) > @@ -421,6 +421,7 @@ > struct vnode *vp; > int error, flg, tmp; > int vfslocked; > + uint64_t bsize; > > vfslocked = 0; > error = 0; > @@ -686,6 +687,31 @@ > vfslocked = 0; > fdrop(fp, td); > break; > + > + case F_READAHEAD: > + FILEDESC_SLOCK(fdp); > + if ((fp = fdtofp(fd, fdp)) == NULL) { > + FILEDESC_SUNLOCK(fdp); > + error = EBADF; > + break; > + } > + if (fp->f_type != DTYPE_VNODE) { > + FILEDESC_SUNLOCK(fdp); > + error = EBADF; > + break; > + } > + fhold(fp); > + FILEDESC_SUNLOCK(fdp); > + if (arg) { > + bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize; > + fp->f_seqcount = (arg + bsize - 1) / bsize; > + fp->f_flag |= O_READAHEAD; > + } else { > + fp->f_flag &= ~O_READAHEAD; > + } > + fdrop(fp, td); > + break; > + > default: > error = EINVAL; > break; > Index: sys/kern/vfs_vnops.c > =================================================================== > --- sys/kern/vfs_vnops.c (revision 197297) > +++ sys/kern/vfs_vnops.c (working copy) > @@ -312,6 +312,9 @@ > sequential_heuristic(struct uio *uio, struct file *fp) > { > > + if (fp->f_flag & O_READAHEAD) > + return (fp->f_seqcount << IO_SEQSHIFT); > + > /* > * Offset 0 is handled specially. open() sets f_seqcount to 1 so > * that the first I/O is normally considered to be slightly > Index: sys/sys/fcntl.h > =================================================================== > --- sys/sys/fcntl.h (revision 197297) > +++ sys/sys/fcntl.h (working copy) > @@ -112,7 +112,11 @@ > #if __BSD_VISIBLE > /* Attempt to bypass buffer cache */ > #define O_DIRECT 0x00010000 > +#ifdef _KERNEL > +/* Read ahead */ > +#define O_READAHEAD 0x00020000 > #endif > +#endif > > /* Defined by POSIX Extended API Set Part 2 */ > #if __BSD_VISIBLE > @@ -218,6 +222,7 @@ > #define F_SETLK 12 /* set record locking information */ > #define F_SETLKW 13 /* F_SETLK; wait if blocked */ > #define F_SETLK_REMOTE 14 /* debugging support for remote locks */ > +#define F_READAHEAD 15 /* read ahead */ > > /* file descriptor flags (F_GETFD, F_SETFD) */ > #define FD_CLOEXEC 1 /* close-on-exec flag */ > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" -- - Alfred Perlstein .- AMA, VMOA #5191, 03 vmax, 92 gs500, 85 ch250 .- FreeBSD committer From is at rambler-co.ru Fri Sep 18 04:40:30 2009 From: is at rambler-co.ru (Igor Sysoev) Date: Fri Sep 18 04:40:37 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <20090917225036.GL21946@elvis.mu.org> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> <20090917225036.GL21946@elvis.mu.org> Message-ID: <20090918044008.GB85663@rambler-co.ru> On Thu, Sep 17, 2009 at 03:50:36PM -0700, Alfred Perlstein wrote: > Please do not make the option have the same name but different > semantics. > > Strongly suggest adding the Darwin name as a toggle and a FreeBSD > name as a specific size option. Then it may be: case F_RDAHEAD: arg = arg ? 128 * 1024: 0; /* FALLTHROUGH F_READAHEAD */ case F_READAHEAD: > -Alfred > > * Xin LI [090917 15:27] wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi, Igor, > > > > Igor Sysoev wrote: > > > Hi, > > > > > > nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO > > > flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single > > > byte. The first aio_read() preloads the first 128K part of a file in VM cache, > > > however, all successive aio_read()s preload just 16K parts of the file. > > > This makes non-blocking sendfile() usage ineffective for files larger > > > than 128K. > > > > > > I've created a small patch for Darwin compatible F_RDAHEAD fcntl: > > > > > > fcntl(fd, F_RDAHEAD, preload_size) > > > > > > There is small incompatibilty: Darwin's fcntl allows just to enable/disable > > > read ahead, while the proposed patch allows to set exact preload size. > > > > > > Currently the preload size affects vn_read() code path only and does not > > > affect on sendfile() code path. However, it can be easy extended on > > > sendfile() part too. The preload size is still limited by sysctl vfs.read_max. > > > > > > The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. > > > > I have ported this as a patch against -HEAD (should apply on 8.0-R but > > it's too late for us to add a new feature) plus a manual page entry > > documenting the feature. > > > > I've used F_READAHEAD as the name, but reading the manual page, it looks > > like we can just use F_RDAHEAD since Darwin seems to just distinguish 0 > > and !=0 case so that programmers won't have to use #ifdef or something > > else to get code working on different platform? > > > > Cheers, > > - -- > > Xin LI http://www.delphij.net/ > > FreeBSD - The Power to Serve! > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v2.0.12 (FreeBSD) > > > > iEYEARECAAYFAkqyt40ACgkQi+vbBBjt66AdKgCfXOo/Vn+zw0cCjS+gGJUgPo8t > > WToAmgKIXaVKsKUcqVOqTwHl4eTFsbkM > > =uP3m > > -----END PGP SIGNATURE----- > > > Index: lib/libc/sys/fcntl.2 > > =================================================================== > > --- lib/libc/sys/fcntl.2 (revision 197297) > > +++ lib/libc/sys/fcntl.2 (working copy) > > @@ -28,7 +28,7 @@ > > .\" @(#)fcntl.2 8.2 (Berkeley) 1/12/94 > > .\" $FreeBSD$ > > .\" > > -.Dd March 8, 2008 > > +.Dd September 19, 2009 > > .Dt FCNTL 2 > > .Os > > .Sh NAME > > @@ -241,6 +241,14 @@ > > .Dv SA_RESTART > > (see > > .Xr sigaction 2 ) . > > +.It Dv F_READAHEAD > > +Set or clear the read ahead amount for sequential access to the third > > +argument, > > +.Fa arg , > > +which is rounded up to the nearest block size. > > +A zero value in > > +.Fa arg > > +turns off read ahead. > > .El > > .Pp > > When a shared lock has been set on a segment of a file, > > Index: sys/kern/kern_descrip.c > > =================================================================== > > --- sys/kern/kern_descrip.c (revision 197297) > > +++ sys/kern/kern_descrip.c (working copy) > > @@ -421,6 +421,7 @@ > > struct vnode *vp; > > int error, flg, tmp; > > int vfslocked; > > + uint64_t bsize; > > > > vfslocked = 0; > > error = 0; > > @@ -686,6 +687,31 @@ > > vfslocked = 0; > > fdrop(fp, td); > > break; > > + > > + case F_READAHEAD: > > + FILEDESC_SLOCK(fdp); > > + if ((fp = fdtofp(fd, fdp)) == NULL) { > > + FILEDESC_SUNLOCK(fdp); > > + error = EBADF; > > + break; > > + } > > + if (fp->f_type != DTYPE_VNODE) { > > + FILEDESC_SUNLOCK(fdp); > > + error = EBADF; > > + break; > > + } > > + fhold(fp); > > + FILEDESC_SUNLOCK(fdp); > > + if (arg) { > > + bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize; > > + fp->f_seqcount = (arg + bsize - 1) / bsize; > > + fp->f_flag |= O_READAHEAD; > > + } else { > > + fp->f_flag &= ~O_READAHEAD; > > + } > > + fdrop(fp, td); > > + break; > > + > > default: > > error = EINVAL; > > break; > > Index: sys/kern/vfs_vnops.c > > =================================================================== > > --- sys/kern/vfs_vnops.c (revision 197297) > > +++ sys/kern/vfs_vnops.c (working copy) > > @@ -312,6 +312,9 @@ > > sequential_heuristic(struct uio *uio, struct file *fp) > > { > > > > + if (fp->f_flag & O_READAHEAD) > > + return (fp->f_seqcount << IO_SEQSHIFT); > > + > > /* > > * Offset 0 is handled specially. open() sets f_seqcount to 1 so > > * that the first I/O is normally considered to be slightly > > Index: sys/sys/fcntl.h > > =================================================================== > > --- sys/sys/fcntl.h (revision 197297) > > +++ sys/sys/fcntl.h (working copy) > > @@ -112,7 +112,11 @@ > > #if __BSD_VISIBLE > > /* Attempt to bypass buffer cache */ > > #define O_DIRECT 0x00010000 > > +#ifdef _KERNEL > > +/* Read ahead */ > > +#define O_READAHEAD 0x00020000 > > #endif > > +#endif > > > > /* Defined by POSIX Extended API Set Part 2 */ > > #if __BSD_VISIBLE > > @@ -218,6 +222,7 @@ > > #define F_SETLK 12 /* set record locking information */ > > #define F_SETLKW 13 /* F_SETLK; wait if blocked */ > > #define F_SETLK_REMOTE 14 /* debugging support for remote locks */ > > +#define F_READAHEAD 15 /* read ahead */ > > > > /* file descriptor flags (F_GETFD, F_SETFD) */ > > #define FD_CLOEXEC 1 /* close-on-exec flag */ > > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > > -- > - Alfred Perlstein > .- AMA, VMOA #5191, 03 vmax, 92 gs500, 85 ch250 > .- FreeBSD committer -- Igor Sysoev http://sysoev.ru/en/ From jhs at berklix.com Fri Sep 18 07:12:01 2009 From: jhs at berklix.com (Julian H. Stacey) Date: Fri Sep 18 07:12:08 2009 Subject: genuine cpu I386_CPU kernel support Message-ID: <200909172209.n8HM9k4q043009@fire.js.berklix.net> Hi hackers, I'm trying to get my Genuine 386 running 7.2. It currently runs 4.11. 386 was first base of FreeBSD, a shame to lose it. So far I've hacked diffs as below + the normal /etc/make.conf CFLAGS += -march=i386 cross compiled all bins libs etc & setenv DESTDIR /usr/7.2 i cd /usr/src/etc l make distrib-dirs cd .. ; make install But manually unloading 4.11 kernel & loading 7.2 kernel & booting doesnt yet boot far enough to encourage me to move bins yet, I think I need to do a bit more kernel before that ? This is what I gave so far. Input welcome. *** /pri/freebsd/releases/7.2-RELEASE/src/sys/./conf/options.i386 Wed Apr 15 05:14:26 2009 --- /usr/src/sys/./conf/options.i386 Thu Sep 17 10:53:11 2009 *************** *** 71,76 **** --- 71,78 ---- NO_MEMORY_HOLE opt_cpu.h # The CPU type affects the endian conversion functions all over the kernel. + // jhs@berklix added I386_CPU + I386_CPU opt_global.h I486_CPU opt_global.h I586_CPU opt_global.h I686_CPU opt_global.h *** /pri/freebsd/releases/7.2-RELEASE/src/sys/./crypto/blowfish/arch/i386/bf_enc.S Wed Apr 15 05:14:26 2009 --- /usr/src/sys/./crypto/blowfish/arch/i386/bf_enc.S Thu Sep 17 10:54:51 2009 *************** *** 10,16 **** * XXX Should use CPP symbols defined as a result of * XXX `cc -mcpu=pentiumpro'. */ ! #if defined(I486_CPU) || defined(I586_CPU) #include "bf_enc_586.S" #else #include "bf_enc_686.S" --- 10,17 ---- * XXX Should use CPP symbols defined as a result of * XXX `cc -mcpu=pentiumpro'. */ ! // jhs@berklix added I386_CPU ! #if defined(I386_CPU) || defined(I486_CPU) || defined(I586_CPU) #include "bf_enc_586.S" #else #include "bf_enc_686.S" *** /pri/freebsd/releases/7.2-RELEASE/src/sys/./i386/conf/GENERIC Wed Apr 15 05:14:26 2009 --- /usr/src/sys/./i386/conf/GENERIC Thu Sep 17 10:56:26 2009 *************** *** 18,23 **** --- 18,24 ---- # # $FreeBSD: src/sys/i386/conf/GENERIC,v 1.474.2.17.2.1 2009/04/15 03:14:26 kensmith Exp $ + cpu I386_CPU # jhs@berklix added I386_CPU cpu I486_CPU cpu I586_CPU cpu I686_CPU *** /pri/freebsd/releases/7.2-RELEASE/src/sys/./i386/i386/identcpu.c Wed Apr 15 05:14:26 2009 --- /usr/src/sys/./i386/i386/identcpu.c Thu Sep 17 11:05:05 2009 *************** *** 622,627 **** --- 622,628 ---- break; case CPUCLASS_386: printf("386"); + // jhs@berklix do we need to add code ? break; #if defined(I486_CPU) case CPUCLASS_486: *************** *** 909,915 **** { #if !defined(lint) ! #if !defined(I486_CPU) && !defined(I586_CPU) && !defined(I686_CPU) #error This kernel is not configured for one of the supported CPUs #endif #else /* lint */ --- 910,917 ---- { #if !defined(lint) ! // jhs@berklix added I386_CPU ! #if !defined(I386_CPU) && !defined(I486_CPU) && !defined(I586_CPU) && !defined(I686_CPU) #error This kernel is not configured for one of the supported CPUs #endif #else /* lint */ *************** *** 920,926 **** --- 922,930 ---- */ switch (cpu_class) { case CPUCLASS_286: /* a 286 should not make it this far, anyway */ + #if !defined(I386_CPU) // jhs@berklix added I386_CPU case CPUCLASS_386: + #endif // jhs@berklix added I386_CPU #if !defined(I486_CPU) case CPUCLASS_486: #endif Cheers, Julian -- Julian Stacey: BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Mail ASCII plain text not HTML & Base64. http://asciiribbon.org Virused Microsoft PCs cause spam. http://berklix.com/free/ From kostikbel at gmail.com Fri Sep 18 07:40:55 2009 From: kostikbel at gmail.com (Kostik Belousov) Date: Fri Sep 18 07:41:01 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <4AB2B7A1.5000601@delphij.net> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> Message-ID: <20090918074027.GI47688@deviant.kiev.zoral.com.ua> On Thu, Sep 17, 2009 at 03:26:41PM -0700, Xin LI wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, Igor, > > Igor Sysoev wrote: > > Hi, > > > > nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO > > flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single > > byte. The first aio_read() preloads the first 128K part of a file in VM cache, > > however, all successive aio_read()s preload just 16K parts of the file. > > This makes non-blocking sendfile() usage ineffective for files larger > > than 128K. > > > > I've created a small patch for Darwin compatible F_RDAHEAD fcntl: > > > > fcntl(fd, F_RDAHEAD, preload_size) > > > > There is small incompatibilty: Darwin's fcntl allows just to enable/disable > > read ahead, while the proposed patch allows to set exact preload size. > > > > Currently the preload size affects vn_read() code path only and does not > > affect on sendfile() code path. However, it can be easy extended on > > sendfile() part too. The preload size is still limited by sysctl vfs.read_max. > > > > The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. > > I have ported this as a patch against -HEAD (should apply on 8.0-R but > it's too late for us to add a new feature) plus a manual page entry > documenting the feature. > > I've used F_READAHEAD as the name, but reading the manual page, it looks > like we can just use F_RDAHEAD since Darwin seems to just distinguish 0 > and !=0 case so that programmers won't have to use #ifdef or something > else to get code working on different platform? What I dislike about the patch is the new kernel-private flag that is eaten from the open(2) flags namespace. We do already have FHASLOCK, so far the only such flag. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090918/48b8ef92/attachment.pgp From shuvaev at physik.uni-wuerzburg.de Fri Sep 18 13:16:40 2009 From: shuvaev at physik.uni-wuerzburg.de (Alexey Shuvaev) Date: Fri Sep 18 13:16:48 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: <200909172209.n8HM9k4q043009@fire.js.berklix.net> References: <200909172209.n8HM9k4q043009@fire.js.berklix.net> Message-ID: <20090918125659.GA88218@wep4035.physik.uni-wuerzburg.de> On Fri, Sep 18, 2009 at 12:09:46AM +0200, Julian H. Stacey wrote: > Hi hackers, > I'm trying to get my Genuine 386 running 7.2. It currently runs 4.11. > 386 was first base of FreeBSD, a shame to lose it. > So far I've hacked diffs as below + the normal > /etc/make.conf CFLAGS += -march=i386 > cross compiled all bins libs etc & > setenv DESTDIR /usr/7.2 i > cd /usr/src/etc l make distrib-dirs > cd .. ; make install > But manually unloading 4.11 kernel & loading 7.2 kernel & booting > doesnt yet boot far enough to encourage me to move bins yet, > I think I need to do a bit more kernel before that ? > This is what I gave so far. Input welcome. > > [snip] > Have you already looked at svn r137784 (and possibly some later commits)? http://svn.freebsd.org/viewvc/base?view=revision&revision=137784 0.02$, Alexey. From remodeler at alentogroup.org Fri Sep 18 18:39:17 2009 From: remodeler at alentogroup.org (remodeler) Date: Fri Sep 18 18:39:24 2009 Subject: SOLUTION MBR hack for serial console Message-ID: <20090918182520.M99301@alentogroup.org> This is a solution to the problem I had. I think others might struggle with it too. John Baldwin kindly helped on this list. The FreeBSD handbook article on setting up serial consoles says "Only sio0 through sio3 (COM1 through COM4) can be used; multiport serial cards will not work". I have a recent motherboard that does not have an on-board serial port, only only has PCI / PCI-e expansion slots. PCI dynamically assigns addresses to devices, and I cannot assign a COM1-COM4 address (0x3F8, 0x2F8, 0x3E8 or 0x2E8) to my single-port PCI serial card because a different PCI-PCI bridge has a bit set claiming the legacy addresses. I cannot modify the use of the legacy com address for the serial console in the boot0 code, due to its single-sector size and complexity. It uses PC BIOS calls to access a serial console, and I do not have access to the proprietary motherboard BIOS to change the mapping below boot0. For this reason, creating /boot.config with a flag enabling the serial console (-P, etc.) causes a lockup on boot before boot0 outputs the "F1: FreeBSD" menu. I can catch the serial console during initialization of the loader, though, and can drop to single-user mode or the loader prompt remotely. I specified my non-legacy serial console address in /etc/make.conf and rebuilt the kernel: BOOT_COMCONSOLE_PORT= 0xE800 I set the port and speed in the boot2 Makefile (/usr/src/sys/boot/i386 - it's an AMD64 machine but amd64 still uses the i386 boot blocks): BOOT_COMCONSOLE_PORT?= 0xe800 BOOT_COMCONSOLE_SPEED?= 115200 Rebuilt the boot blocks and wrote the new boot blocks out: # cd /sys/boot # make clean # make # make install # bsdlabel -B /dev/boot_disk I added the console flag to the serial device in /boot/device.hints (could be sio driver instead of uart, this is on 9.0-HEAD): hint.uart.0.port="0xE800" hint.uart.0.flags="0x10" hint.uart.0.irq="20" And last I set these environmental variables in /boot/loader.conf: console="comconsole" comconsole_speed="115200" My goal was a headless remote server w/o high-end server gear, so I didn't set "boot_multicons". I didn't see any effect to setting "boot_serial" in the loader configuration file. I have a fully functioning serial console ;) __ __ ________ ____ ___ ____ ____/ /__ / /__ _____ / ___/ _ \/ __ `__ \/ __ \/ __ / _ \/ / _ \/ ___/ / / / __/ / / / / / /_/ / /_/ / __/ / __/ / /_/ \___/_/ /_/ /_/\____/\__,_/\___/_/\___/_/ The information contained in this message is confidential and is intended for the addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. From remodeler at alentogroup.org Fri Sep 18 23:49:04 2009 From: remodeler at alentogroup.org (remodeler) Date: Fri Sep 18 23:49:12 2009 Subject: SOLUTION MBR hack for serial console Message-ID: <20090918235822.M19955@alentogroup.org> > I found your Email most helpful thank you. if I may ask, how do you > get vim to display the correct rows and columns? > I am using a xterm on a 1440x900 monitor and ssh to another FreeBSD > machine that has my serial cable to my server. > vim appears a very small, do you know how to change this? > > Sam Fourman Jr. I left out the change to /etc/ttys in my original solution e-mail; something like the following to set a line type for the tty: ttyu0 "/usr/libexec/getty std.9600" vt100 on insecure vt100 is a 80 column x 24 line terminal. I think using vt100-w instead would give you a 132 column line (described in termcap(5)). __ __ ________ ____ ___ ____ ____/ /__ / /__ _____ / ___/ _ \/ __ `__ \/ __ \/ __ / _ \/ / _ \/ ___/ / / / __/ / / / / / /_/ / /_/ / __/ / __/ / /_/ \___/_/ /_/ /_/\____/\__,_/\___/_/\___/_/ The information contained in this message is confidential and is intended for the addressee only. Any unauthorized use, dissemination of the information, or copying of this message is prohibited. From sojdaa at iem.pw.edu.pl Sun Sep 20 15:02:37 2009 From: sojdaa at iem.pw.edu.pl (sojdaa) Date: Sun Sep 20 15:02:43 2009 Subject: FreeBSD SVN repository mirror with local branches ability Message-ID: <969df365997cf2a4449b74c6af9e52e9@iem.pw.edu.pl> Hello Like in the subject, I want to install SVN FreeBSD repo mirror and would like to have the possibility to create my own branches, that will be merged with synchronized local mirror. I've done this using svk after reading the subversion primer: http://wiki.freebsd.org/SubversionPrimer, but I'm wondering if there's any other possibilities to do this and avoid using perl, because svk is a set of perl scripts. I wanted to keep the whole system as simple as possible. I read some topics about git, mercurial and communication with svn, but these are other versioning systems. Unfortunately svnsync can create purely read-only mirrors. Is there any way somehow to use only svn tools, like svnadmin, svnsync, but combine them to create a mirror with write capabilities or there is no sense in trying other tools than svk? Thanks for any help! Arek From is at rambler-co.ru Mon Sep 21 11:12:47 2009 From: is at rambler-co.ru (Igor Sysoev) Date: Mon Sep 21 11:12:54 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <20090918074027.GI47688@deviant.kiev.zoral.com.ua> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> <20090918074027.GI47688@deviant.kiev.zoral.com.ua> Message-ID: <20090921111245.GB23958@rambler-co.ru> On Fri, Sep 18, 2009 at 10:40:27AM +0300, Kostik Belousov wrote: > On Thu, Sep 17, 2009 at 03:26:41PM -0700, Xin LI wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi, Igor, > > > > Igor Sysoev wrote: > > > Hi, > > > > > > nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO > > > flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single > > > byte. The first aio_read() preloads the first 128K part of a file in VM cache, > > > however, all successive aio_read()s preload just 16K parts of the file. > > > This makes non-blocking sendfile() usage ineffective for files larger > > > than 128K. > > > > > > I've created a small patch for Darwin compatible F_RDAHEAD fcntl: > > > > > > fcntl(fd, F_RDAHEAD, preload_size) > > > > > > There is small incompatibilty: Darwin's fcntl allows just to enable/disable > > > read ahead, while the proposed patch allows to set exact preload size. > > > > > > Currently the preload size affects vn_read() code path only and does not > > > affect on sendfile() code path. However, it can be easy extended on > > > sendfile() part too. The preload size is still limited by sysctl vfs.read_max. > > > > > > The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. > > > > I have ported this as a patch against -HEAD (should apply on 8.0-R but > > it's too late for us to add a new feature) plus a manual page entry > > documenting the feature. > > > > I've used F_READAHEAD as the name, but reading the manual page, it looks > > like we can just use F_RDAHEAD since Darwin seems to just distinguish 0 > > and !=0 case so that programmers won't have to use #ifdef or something > > else to get code working on different platform? > > What I dislike about the patch is the new kernel-private flag that is > eaten from the open(2) flags namespace. We do already have FHASLOCK, > so far the only such flag. We can change int f_seqcount; to u_int f_seqcount; and can use highest bit instead of O_READAHEAD: anyway f_seqcount is shifted to 16 bits left. -- Igor Sysoev http://sysoev.ru/en/ From kostikbel at gmail.com Mon Sep 21 11:29:30 2009 From: kostikbel at gmail.com (Kostik Belousov) Date: Mon Sep 21 11:29:37 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <20090921111245.GB23958@rambler-co.ru> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> <20090918074027.GI47688@deviant.kiev.zoral.com.ua> <20090921111245.GB23958@rambler-co.ru> Message-ID: <20090921112909.GQ47688@deviant.kiev.zoral.com.ua> On Mon, Sep 21, 2009 at 03:12:45PM +0400, Igor Sysoev wrote: > On Fri, Sep 18, 2009 at 10:40:27AM +0300, Kostik Belousov wrote: > > > On Thu, Sep 17, 2009 at 03:26:41PM -0700, Xin LI wrote: > > > -----BEGIN PGP SIGNED MESSAGE----- > > > Hash: SHA1 > > > > > > Hi, Igor, > > > > > > Igor Sysoev wrote: > > > > Hi, > > > > > > > > nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO > > > > flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single > > > > byte. The first aio_read() preloads the first 128K part of a file in VM cache, > > > > however, all successive aio_read()s preload just 16K parts of the file. > > > > This makes non-blocking sendfile() usage ineffective for files larger > > > > than 128K. > > > > > > > > I've created a small patch for Darwin compatible F_RDAHEAD fcntl: > > > > > > > > fcntl(fd, F_RDAHEAD, preload_size) > > > > > > > > There is small incompatibilty: Darwin's fcntl allows just to enable/disable > > > > read ahead, while the proposed patch allows to set exact preload size. > > > > > > > > Currently the preload size affects vn_read() code path only and does not > > > > affect on sendfile() code path. However, it can be easy extended on > > > > sendfile() part too. The preload size is still limited by sysctl vfs.read_max. > > > > > > > > The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. > > > > > > I have ported this as a patch against -HEAD (should apply on 8.0-R but > > > it's too late for us to add a new feature) plus a manual page entry > > > documenting the feature. > > > > > > I've used F_READAHEAD as the name, but reading the manual page, it looks > > > like we can just use F_RDAHEAD since Darwin seems to just distinguish 0 > > > and !=0 case so that programmers won't have to use #ifdef or something > > > else to get code working on different platform? > > > > What I dislike about the patch is the new kernel-private flag that is > > eaten from the open(2) flags namespace. We do already have FHASLOCK, > > so far the only such flag. > > We can change > int f_seqcount; > to > u_int f_seqcount; > > and can use highest bit instead of O_READAHEAD: anyway f_seqcount is shifted > to 16 bits left. Or do the same trick as was done for FHASLOCK and override some flag that is not saved after open, see FMASK. Or split f_seqcount into two u_short fields, one for f_seqcount, second for f_kflag, and use the later for FHASLOCK and FREADAHEAD. [We are trying to not grow struct file unless absolutely neccessary]. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090921/8b5df895/attachment.pgp From jhs at berklix.com Mon Sep 21 12:01:26 2009 From: jhs at berklix.com (Julian H. Stacey) Date: Mon Sep 21 12:01:33 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: Your message "Fri, 18 Sep 2009 14:56:59 +0200." <20090918125659.GA88218@wep4035.physik.uni-wuerzburg.de> Message-ID: <200909211203.n8LC3hhn090227@fire.js.berklix.net> Hi, Reference: > From: Alexey Shuvaev > Date: Fri, 18 Sep 2009 14:56:59 +0200 > Message-id: <20090918125659.GA88218@wep4035.physik.uni-wuerzburg.de> Alexey Shuvaev wrote: > On Fri, Sep 18, 2009 at 12:09:46AM +0200, Julian H. Stacey wrote: > > Hi hackers, > > I'm trying to get my Genuine 386 running 7.2. It currently runs 4.11. > > 386 was first base of FreeBSD, a shame to lose it. > > So far I've hacked diffs as below + the normal > > /etc/make.conf CFLAGS += -march=i386 > > cross compiled all bins libs etc & > > setenv DESTDIR /usr/7.2 i > > cd /usr/src/etc l make distrib-dirs > > cd .. ; make install > > But manually unloading 4.11 kernel & loading 7.2 kernel & booting > > doesnt yet boot far enough to encourage me to move bins yet, > > I think I need to do a bit more kernel before that ? > > This is what I gave so far. Input welcome. > > > > [snip] > > > Have you already looked at svn r137784 (and possibly some later commits)? > http://svn.freebsd.org/viewvc/base?view=revision&revision=137784 > > 0.02$, > Alexey. Thanks Alexey, No I hadn't seen that. I had just a quick look so far. I'll look more to see what tio change to compile my 80386 kernel. PS I cc'd jhb@ who seems to be the one who removed 80386. Maybe he has a patch set or comment. Cheers, Julian -- Julian Stacey: BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Mail ASCII plain text not HTML & Base64. http://asciiribbon.org Virused Microsoft PCs cause spam. http://berklix.com/free/ From luizgustavo at luizgustavo.pro.br Mon Sep 21 18:47:15 2009 From: luizgustavo at luizgustavo.pro.br (Luiz Gustavo S. Costa) Date: Mon Sep 21 18:47:22 2009 Subject: HAMMER FS port (status ?) Message-ID: <772ca7d0909211122w67791f27q1b118ad7f20fbb58@mail.gmail.com> Hi guys ! Is there anyone doing the port HAMMER FS Dragonfly for FreeBSD? If so, what of the process of port? Where can I find more information? thanks -- Luiz Gustavo Costa (Powered by BSD) *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ mundoUnix - Consultoria em Software Livre http://www.mundounix.com.br ICQ: 2890831 / MSN: contato@mundounix.com.br From is at rambler-co.ru Tue Sep 22 07:28:50 2009 From: is at rambler-co.ru (Igor Sysoev) Date: Tue Sep 22 07:28:58 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <20090921112909.GQ47688@deviant.kiev.zoral.com.ua> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> <20090918074027.GI47688@deviant.kiev.zoral.com.ua> <20090921111245.GB23958@rambler-co.ru> <20090921112909.GQ47688@deviant.kiev.zoral.com.ua> Message-ID: <20090922072848.GA727@rambler-co.ru> On Mon, Sep 21, 2009 at 02:29:09PM +0300, Kostik Belousov wrote: > On Mon, Sep 21, 2009 at 03:12:45PM +0400, Igor Sysoev wrote: > > > What I dislike about the patch is the new kernel-private flag that is > > > eaten from the open(2) flags namespace. We do already have FHASLOCK, > > > so far the only such flag. > > > > We can change > > int f_seqcount; > > to > > u_int f_seqcount; > > > > and can use highest bit instead of O_READAHEAD: anyway f_seqcount is shifted > > to 16 bits left. > > Or do the same trick as was done for FHASLOCK and override some flag that > is not saved after open, see FMASK. > > Or split f_seqcount into two u_short fields, one for f_seqcount, second for > f_kflag, and use the later for FHASLOCK and FREADAHEAD. [We are trying to > not grow struct file unless absolutely neccessary]. I agree that struct file should not grow (at least in this case). However, I believe splitting f_seqcount into two fields will break kernel ABI. Or not ? I think f_seqcount should be splitted in 9-CURRENT and probably, in 8-STABLE, but in 7-STABLE we may use the open(2) flags namespace. -- Igor Sysoev http://sysoev.ru/en/ From kostikbel at gmail.com Tue Sep 22 08:54:03 2009 From: kostikbel at gmail.com (Kostik Belousov) Date: Tue Sep 22 08:54:10 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <20090922072848.GA727@rambler-co.ru> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> <20090918074027.GI47688@deviant.kiev.zoral.com.ua> <20090921111245.GB23958@rambler-co.ru> <20090921112909.GQ47688@deviant.kiev.zoral.com.ua> <20090922072848.GA727@rambler-co.ru> Message-ID: <20090922085346.GR47688@deviant.kiev.zoral.com.ua> On Tue, Sep 22, 2009 at 11:28:48AM +0400, Igor Sysoev wrote: > On Mon, Sep 21, 2009 at 02:29:09PM +0300, Kostik Belousov wrote: > > > On Mon, Sep 21, 2009 at 03:12:45PM +0400, Igor Sysoev wrote: > > > > > What I dislike about the patch is the new kernel-private flag that is > > > > eaten from the open(2) flags namespace. We do already have FHASLOCK, > > > > so far the only such flag. > > > > > > We can change > > > int f_seqcount; > > > to > > > u_int f_seqcount; > > > > > > and can use highest bit instead of O_READAHEAD: anyway f_seqcount is shifted > > > to 16 bits left. > > > > Or do the same trick as was done for FHASLOCK and override some flag that > > is not saved after open, see FMASK. > > > > Or split f_seqcount into two u_short fields, one for f_seqcount, second for > > f_kflag, and use the later for FHASLOCK and FREADAHEAD. [We are trying to > > not grow struct file unless absolutely neccessary]. > > I agree that struct file should not grow (at least in this case). > However, I believe splitting f_seqcount into two fields will break > kernel ABI. Or not ? I think f_seqcount should be splitted in 9-CURRENT > and probably, in 8-STABLE, but in 7-STABLE we may use the open(2) flags > namespace. The struct file indeed participates in the KBI, in particular, pointer to it is supplied as an argument to VOP_OPEN() and d_fdopen(). On the other hand, it is assumed that drivers and fses use it to override f_ops and possibly f_data. f_seqcount status is internal VFS field that probably should be not accessed or modified by driver or fs. Reason to try hard to keep layout of struct file intact even between major branches is the userspace compatibility, with the code of lsof and fstat. Might be, fstat will be improved to not require this. Probably, best temporal solution would be to override some flag used only for open(2), postponing the task of separating bit- and name-spaces for other day. Also, it makes merge to 8 and 7 easier. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090922/d6fd7b5c/attachment.pgp From is at rambler-co.ru Tue Sep 22 09:54:30 2009 From: is at rambler-co.ru (Igor Sysoev) Date: Tue Sep 22 09:54:38 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <20090922085346.GR47688@deviant.kiev.zoral.com.ua> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> <20090918074027.GI47688@deviant.kiev.zoral.com.ua> <20090921111245.GB23958@rambler-co.ru> <20090921112909.GQ47688@deviant.kiev.zoral.com.ua> <20090922072848.GA727@rambler-co.ru> <20090922085346.GR47688@deviant.kiev.zoral.com.ua> Message-ID: <20090922095428.GH1152@rambler-co.ru> On Tue, Sep 22, 2009 at 11:53:46AM +0300, Kostik Belousov wrote: > On Tue, Sep 22, 2009 at 11:28:48AM +0400, Igor Sysoev wrote: > > On Mon, Sep 21, 2009 at 02:29:09PM +0300, Kostik Belousov wrote: > > > > > On Mon, Sep 21, 2009 at 03:12:45PM +0400, Igor Sysoev wrote: > > > > > > > What I dislike about the patch is the new kernel-private flag that is > > > > > eaten from the open(2) flags namespace. We do already have FHASLOCK, > > > > > so far the only such flag. > > > > > > > > We can change > > > > int f_seqcount; > > > > to > > > > u_int f_seqcount; > > > > > > > > and can use highest bit instead of O_READAHEAD: anyway f_seqcount is shifted > > > > to 16 bits left. > > > > > > Or do the same trick as was done for FHASLOCK and override some flag that > > > is not saved after open, see FMASK. > > > > > > Or split f_seqcount into two u_short fields, one for f_seqcount, second for > > > f_kflag, and use the later for FHASLOCK and FREADAHEAD. [We are trying to > > > not grow struct file unless absolutely neccessary]. > > > > I agree that struct file should not grow (at least in this case). > > However, I believe splitting f_seqcount into two fields will break > > kernel ABI. Or not ? I think f_seqcount should be splitted in 9-CURRENT > > and probably, in 8-STABLE, but in 7-STABLE we may use the open(2) flags > > namespace. > > The struct file indeed participates in the KBI, in particular, pointer > to it is supplied as an argument to VOP_OPEN() and d_fdopen(). On the > other hand, it is assumed that drivers and fses use it to override > f_ops and possibly f_data. f_seqcount status is internal VFS field that > probably should be not accessed or modified by driver or fs. > > Reason to try hard to keep layout of struct file intact even between major > branches is the userspace compatibility, with the code of lsof and fstat. > Might be, fstat will be improved to not require this. > > Probably, best temporal solution would be to override some flag used > only for open(2), postponing the task of separating bit- and name-spaces > for other day. Also, it makes merge to 8 and 7 easier. Well, I think O_CREAT or O_TRUNC are good candidate to be an alias for O_READAHEAD. -- Igor Sysoev http://sysoev.ru/en/ From is at rambler-co.ru Tue Sep 22 10:05:38 2009 From: is at rambler-co.ru (Igor Sysoev) Date: Tue Sep 22 10:05:44 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <20090921112909.GQ47688@deviant.kiev.zoral.com.ua> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> <20090918074027.GI47688@deviant.kiev.zoral.com.ua> <20090921111245.GB23958@rambler-co.ru> <20090921112909.GQ47688@deviant.kiev.zoral.com.ua> Message-ID: <20090922100535.GI1152@rambler-co.ru> On Mon, Sep 21, 2009 at 02:29:09PM +0300, Kostik Belousov wrote: > > > What I dislike about the patch is the new kernel-private flag that is > > > eaten from the open(2) flags namespace. We do already have FHASLOCK, > > > so far the only such flag. > > > > We can change > > int f_seqcount; > > to > > u_int f_seqcount; > > > > and can use highest bit instead of O_READAHEAD: anyway f_seqcount is shifted > > to 16 bits left. > > Or do the same trick as was done for FHASLOCK and override some flag that > is not saved after open, see FMASK. Probably, you meant FPOSIXSHM, but not FHASLOCK: /* * We are out of bits in f_flag (which is a short). However, * the flag bits not set in FMASK are only meaningful in the * initial open syscall. Those bits can thus be given a * different meaning for fcntl(2). */ #if __BSD_VISIBLE /* * Set by shm_open(3) to get automatic MAP_ASYNC behavior * for POSIX shared memory objects (which are otherwise * implemented as plain files). */ #define FPOSIXSHM O_NOFOLLOW #endif -- Igor Sysoev http://sysoev.ru/en/ From is at rambler-co.ru Tue Sep 22 12:33:57 2009 From: is at rambler-co.ru (Igor Sysoev) Date: Tue Sep 22 12:34:04 2009 Subject: fcntl(F_RDAHEAD) In-Reply-To: <20090918074027.GI47688@deviant.kiev.zoral.com.ua> References: <20090917101526.GF57619@rambler-co.ru> <4AB2B7A1.5000601@delphij.net> <20090918074027.GI47688@deviant.kiev.zoral.com.ua> Message-ID: <20090922123355.GA30679@rambler-co.ru> On Fri, Sep 18, 2009 at 10:40:27AM +0300, Kostik Belousov wrote: > On Thu, Sep 17, 2009 at 03:26:41PM -0700, Xin LI wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi, Igor, > > > > Igor Sysoev wrote: > > > Hi, > > > > > > nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO > > > flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single > > > byte. The first aio_read() preloads the first 128K part of a file in VM cache, > > > however, all successive aio_read()s preload just 16K parts of the file. > > > This makes non-blocking sendfile() usage ineffective for files larger > > > than 128K. > > > > > > I've created a small patch for Darwin compatible F_RDAHEAD fcntl: > > > > > > fcntl(fd, F_RDAHEAD, preload_size) > > > > > > There is small incompatibilty: Darwin's fcntl allows just to enable/disable > > > read ahead, while the proposed patch allows to set exact preload size. > > > > > > Currently the preload size affects vn_read() code path only and does not > > > affect on sendfile() code path. However, it can be easy extended on > > > sendfile() part too. The preload size is still limited by sysctl vfs.read_max. > > > > > > The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only. > > > > I have ported this as a patch against -HEAD (should apply on 8.0-R but > > it's too late for us to add a new feature) plus a manual page entry > > documenting the feature. > > > > I've used F_READAHEAD as the name, but reading the manual page, it looks > > like we can just use F_RDAHEAD since Darwin seems to just distinguish 0 > > and !=0 case so that programmers won't have to use #ifdef or something > > else to get code working on different platform? > > What I dislike about the patch is the new kernel-private flag that is > eaten from the open(2) flags namespace. We do already have FHASLOCK, > so far the only such flag. The new patch version against 7.2 is attached. Changes: 1) two fcntl's: F_READAHEAD and Darwin compatible F_RDAHEAD, 2) FREADAHEAD uses O_CREAT bit. -- Igor Sysoev http://sysoev.ru/en/ -------------- next part -------------- --- /sys/sys/fcntl.h 2009-06-02 19:05:17.000000000 +0400 +++ /sys/sys/fcntl.h 2009-09-22 16:28:52.000000000 +0400 @@ -132,7 +132,7 @@ /* bits to save after open */ #define FMASK (FREAD|FWRITE|FAPPEND|FASYNC|FFSYNC|FNONBLOCK|O_DIRECT) /* bits settable by fcntl(F_SETFL, ...) */ -#define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FPOSIXSHM|O_DIRECT) +#define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FPOSIXSHM|FRDAHEAD|O_DIRECT) #endif /* @@ -163,6 +163,9 @@ * implemented as plain files). */ #define FPOSIXSHM O_NOFOLLOW + +/* Read ahead */ +#define FRDAHEAD O_CREAT #endif /* @@ -187,6 +190,8 @@ #define F_SETLK 12 /* set record locking information */ #define F_SETLKW 13 /* F_SETLK; wait if blocked */ #define F_SETLK_REMOTE 14 /* debugging support for remote locks */ +#define F_READAHEAD 15 /* read ahead */ +#define F_RDAHEAD 16 /* Darwin compatible read ahead */ /* file descriptor flags (F_GETFD, F_SETFD) */ #define FD_CLOEXEC 1 /* close-on-exec flag */ --- /sys/kern/vfs_vnops.c 2009-06-02 19:05:00.000000000 +0400 +++ /sys/kern/vfs_vnops.c 2009-09-22 14:08:03.000000000 +0400 @@ -305,6 +305,9 @@ sequential_heuristic(struct uio *uio, struct file *fp) { + if (fp->f_flag & FRDAHEAD) + return(fp->f_seqcount << IO_SEQSHIFT); + if ((uio->uio_offset == 0 && fp->f_seqcount > 0) || uio->uio_offset == fp->f_nextoff) { /* --- /sys/kern/kern_descrip.c 2009-08-28 18:50:11.000000000 +0400 +++ /sys/kern/kern_descrip.c 2009-09-22 14:17:47.000000000 +0400 @@ -411,6 +411,7 @@ u_int newmin; int error, flg, tmp; int vfslocked; + uint64_t bsize; vfslocked = 0; error = 0; @@ -694,6 +695,35 @@ vfslocked = 0; fdrop(fp, td); break; + + case F_RDAHEAD: + arg = arg ? 128 * 1024: 0; + /* FALLTHROUGH F_READAHEAD */ + + case F_READAHEAD: + FILEDESC_SLOCK(fdp); + if ((fp = fdtofp(fd, fdp)) == NULL) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + if (fp->f_type != DTYPE_VNODE) { + FILEDESC_SUNLOCK(fdp); + error = EBADF; + break; + } + FILE_LOCK(fp); + if (arg) { + bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize; + fp->f_seqcount = (arg + bsize - 1) / bsize; + fp->f_flag |= FRDAHEAD; + } else { + fp->f_flag &= ~FRDAHEAD; + } + FILE_UNLOCK(fp); + FILEDESC_SUNLOCK(fdp); + break; + default: error = EINVAL; break; From pjd at FreeBSD.org Tue Sep 22 13:02:16 2009 From: pjd at FreeBSD.org (Pawel Jakub Dawidek) Date: Tue Sep 22 13:02:22 2009 Subject: ZFS group ownership In-Reply-To: <4AAB8AD0.5010302@zirakzigil.org> References: <4AAB8AD0.5010302@zirakzigil.org> Message-ID: <20090922130212.GK6038@garage.freebsd.pl> On Sat, Sep 12, 2009 at 01:49:36PM +0200, Giulio Ferro wrote: [...] > Now I try to do the same on a zfs partition on the same machine > This is what I see with ls > --------------------------------------------------------------- > ls -la > total 4 > drwxrwx--- 3 www www 4 Sep 12 13:43 . > drwxr-xr-x 4 root wheel 4 Sep 12 13:43 .. > drwxrwx--- 2 gferro gferro 2 Sep 12 13:43 asda > -rw-rw---- 1 gferro gferro 0 Sep 12 13:43 qweq > --------------------------------------------------------------- > > As you can see, both file and directory belongs now to "gferro" and > not "www". This means that other users won't even be able to read > my files / dir, let alone modify them. > > What I ask now is: is this a bug or a feature? This is a bug. I changed default ZFS behaviour (which is SYSV) to match BSD behaviour (ie. inherit group ownership from the parent directory), but it become broken during v6 -> v13 switch. Could you file PR for this, I should be able to fix it before 8.0-RELEASE. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090922/2240dc91/attachment.pgp From jhb at freebsd.org Tue Sep 22 16:51:58 2009 From: jhb at freebsd.org (John Baldwin) Date: Tue Sep 22 16:52:31 2009 Subject: FreeBSD SVN repository mirror with local branches ability In-Reply-To: <969df365997cf2a4449b74c6af9e52e9@iem.pw.edu.pl> References: <969df365997cf2a4449b74c6af9e52e9@iem.pw.edu.pl> Message-ID: <200909220913.48154.jhb@freebsd.org> On Sunday 20 September 2009 10:43:01 am sojdaa wrote: > > Hello > > Like in the subject, I want to install SVN FreeBSD repo mirror and would > like to have the possibility to create my own branches, that will be merged > with synchronized local mirror. I've done this using svk after reading the > subversion primer: http://wiki.freebsd.org/SubversionPrimer, but I'm > wondering if there's any other possibilities to do this and avoid using > perl, because svk is a set of perl scripts. I wanted to keep the whole > system as simple as possible. I read some topics about git, mercurial and > communication with svn, but these are other versioning systems. > Unfortunately svnsync can create purely read-only mirrors. Is there any way > somehow to use only svn tools, like svnadmin, svnsync, but combine them to > create a mirror with write capabilities or there is no sense in trying > other tools than svk? I have used svk for this at ${JOB} and it works well for managing the mirror. -- John Baldwin From jhb at freebsd.org Tue Sep 22 16:52:01 2009 From: jhb at freebsd.org (John Baldwin) Date: Tue Sep 22 16:52:47 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: <200909211203.n8LC3hhn090227@fire.js.berklix.net> References: <200909211203.n8LC3hhn090227@fire.js.berklix.net> Message-ID: <200909221027.48607.jhb@freebsd.org> On Monday 21 September 2009 8:03:43 am Julian H. Stacey wrote: > Hi, > Reference: > > From: Alexey Shuvaev > > Date: Fri, 18 Sep 2009 14:56:59 +0200 > > Message-id: <20090918125659.GA88218@wep4035.physik.uni-wuerzburg.de> > > Alexey Shuvaev wrote: > > On Fri, Sep 18, 2009 at 12:09:46AM +0200, Julian H. Stacey wrote: > > > Hi hackers, > > > I'm trying to get my Genuine 386 running 7.2. It currently runs 4.11. > > > 386 was first base of FreeBSD, a shame to lose it. > > > So far I've hacked diffs as below + the normal > > > /etc/make.conf CFLAGS += -march=i386 > > > cross compiled all bins libs etc & > > > setenv DESTDIR /usr/7.2 i > > > cd /usr/src/etc l make distrib-dirs > > > cd .. ; make install > > > But manually unloading 4.11 kernel & loading 7.2 kernel & booting > > > doesnt yet boot far enough to encourage me to move bins yet, > > > I think I need to do a bit more kernel before that ? > > > This is what I gave so far. Input welcome. > > > > > > [snip] > > > > > Have you already looked at svn r137784 (and possibly some later commits)? > > http://svn.freebsd.org/viewvc/base?view=revision&revision=137784 > > > > 0.02$, > > Alexey. > > Thanks Alexey, No I hadn't seen that. I had just a quick look so far. > I'll look more to see what tio change to compile my 80386 kernel. > PS I cc'd jhb@ who seems to be the one who removed 80386. > Maybe he has a patch set or comment. My comment is to just use 4.x (seriously). A true 386 is going to be quite slow and the overhead of many things added that work well on newer processors is going to be very painful on a 386 (probably on a 486 as well). 4.x runs fine on a 386 and should support all the hardware you can stick into a machine with an 80386 CPU. -- John Baldwin From nate at thatsmathematics.com Tue Sep 22 18:22:20 2009 From: nate at thatsmathematics.com (Nate Eldredge) Date: Tue Sep 22 18:22:27 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: <200909221027.48607.jhb@freebsd.org> References: <200909211203.n8LC3hhn090227@fire.js.berklix.net> <200909221027.48607.jhb@freebsd.org> Message-ID: On Tue, 22 Sep 2009, John Baldwin wrote: > My comment is to just use 4.x (seriously). A true 386 is going to be quite > slow and the overhead of many things added that work well on newer processors > is going to be very painful on a 386 (probably on a 486 as well). 4.x runs > fine on a 386 and should support all the hardware you can stick into a > machine with an 80386 CPU. Unless, of course, you plan to put it on a network. I doubt that 4.x is up to date with respect to security patches. -- Nate Eldredge nate@thatsmathematics.com From rpaulo at gmail.com Wed Sep 23 12:00:40 2009 From: rpaulo at gmail.com (Rui Paulo) Date: Wed Sep 23 12:23:17 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: References: <200909211203.n8LC3hhn090227@fire.js.berklix.net> <200909221027.48607.jhb@freebsd.org> Message-ID: <5311D83C-0DB0-4D10-B2AB-B61FD37178F7@gmail.com> On 22 Sep 2009, at 19:03, Nate Eldredge wrote: > On Tue, 22 Sep 2009, John Baldwin wrote: > >> My comment is to just use 4.x (seriously). A true 386 is going to >> be quite >> slow and the overhead of many things added that work well on newer >> processors >> is going to be very painful on a 386 (probably on a 486 as well). >> 4.x runs >> fine on a 386 and should support all the hardware you can stick >> into a >> machine with an 80386 CPU. > > Unless, of course, you plan to put it on a network. I doubt that > 4.x is up to date with respect to security patches. I don't know if they were all applied on 4.x, but I think at least the older ones are. -- Rui Paulo From jhs at berklix.com Wed Sep 23 15:53:01 2009 From: jhs at berklix.com (Julian H. Stacey) Date: Wed Sep 23 15:53:08 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: Your message "Wed, 23 Sep 2009 12:27:17 BST." <5311D83C-0DB0-4D10-B2AB-B61FD37178F7@gmail.com> Message-ID: <200909231554.n8NFsYwT078965@fire.js.berklix.net> Rui Paulo wrote: > On 22 Sep 2009, at 19:03, Nate Eldredge wrote: > > > On Tue, 22 Sep 2009, John Baldwin wrote: > > > >> My comment is to just use 4.x (seriously). A true 386 is going to > >> be quite > >> slow and the overhead of many things added that work well on newer > >> processors > >> is going to be very painful on a 386 (probably on a 486 as well). > >> 4.x runs > >> fine on a 386 and should support all the hardware you can stick > >> into a > >> machine with an 80386 CPU. > > > > Unless, of course, you plan to put it on a network. I doubt that > > 4.x is up to date with respect to security patches. > > I don't know if they were all applied on 4.x, but I think at least the > older ones are. 4.11 fell out of security support some while back, but http://www.freebsd.org/security/index.html only lists what's still in, not what fell out when. Free/ Net/ Open/ Dragon etc all derive from Bill Jollitz port of BSD to 386. Would be nice if we could still keep that first platform walking, even if speed can't be called running ;-) Maybe I'll get time to chase down all that came before http://svn.freebsd.org/viewvc/base?view=revision&revision=137784 Cheers, Julian -- Julian Stacey: BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Mail ASCII plain text not HTML & Base64. http://asciiribbon.org Virused Microsoft PCs cause spam. http://berklix.com/free/ From rpaulo at gmail.com Wed Sep 23 16:08:14 2009 From: rpaulo at gmail.com (Rui Paulo) Date: Wed Sep 23 16:37:48 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: <200909231554.n8NFsYwT078965@fire.js.berklix.net> References: <200909231554.n8NFsYwT078965@fire.js.berklix.net> Message-ID: On 23 Sep 2009, at 16:54, Julian H. Stacey wrote: > 4.11 fell out of security support some while back, but > http://www.freebsd.org/security/index.html > only lists what's still in, not what fell out when. Right, but IIRC there were some folks patch 4-STABLE after the security officer dropped it. -- Rui Paulo From jhb at freebsd.org Wed Sep 23 16:54:25 2009 From: jhb at freebsd.org (John Baldwin) Date: Wed Sep 23 16:54:44 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: <200909231554.n8NFsYwT078965@fire.js.berklix.net> References: <200909231554.n8NFsYwT078965@fire.js.berklix.net> Message-ID: <200909231209.08346.jhb@freebsd.org> On Wednesday 23 September 2009 11:54:34 am Julian H. Stacey wrote: > Rui Paulo wrote: > > On 22 Sep 2009, at 19:03, Nate Eldredge wrote: > > > > > On Tue, 22 Sep 2009, John Baldwin wrote: > > > > > >> My comment is to just use 4.x (seriously). A true 386 is going to > > >> be quite > > >> slow and the overhead of many things added that work well on newer > > >> processors > > >> is going to be very painful on a 386 (probably on a 486 as well). > > >> 4.x runs > > >> fine on a 386 and should support all the hardware you can stick > > >> into a > > >> machine with an 80386 CPU. > > > > > > Unless, of course, you plan to put it on a network. I doubt that > > > 4.x is up to date with respect to security patches. > > > > I don't know if they were all applied on 4.x, but I think at least the > > older ones are. > > 4.11 fell out of security support some while back, but > http://www.freebsd.org/security/index.html > only lists what's still in, not what fell out when. > > Free/ Net/ Open/ Dragon etc all derive from Bill Jollitz port of > BSD to 386. Would be nice if we could still keep that first platform > walking, even if speed can't be called running ;-) > > Maybe I'll get time to chase down all that came before > http://svn.freebsd.org/viewvc/base?view=revision&revision=137784 Other things added since then assume at least a 486. Not having cmpxchg is a bit of a killer. The umtx stuff used by libthr assumes it can do a cmpxchg in userland for example. One idea kicked around many years ago was catching the illegal instruction faults for userland and emulating cmpxchg, but that would be a good bit of work. FreeBSD now also makes liberal use of 'xadd' for reference counts (see refcount_*()) so you would need to support that on a 386 as well. There may be other places that I'm not aware of that have similar assumptions. FWIW, I would probably not be in favor of putting any patches into the tree if you do manage to get it all working. I suspect the userbase of FreeBSD/80386 is even smaller than FreeBSD/alpha or FreeBSD/sparc64 and 80386 support would add a lot of ugly #ifdef's for miniscule gain. -- John Baldwin From julian at elischer.org Wed Sep 23 17:21:57 2009 From: julian at elischer.org (Julian Elischer) Date: Wed Sep 23 17:22:05 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: <200909231209.08346.jhb@freebsd.org> References: <200909231554.n8NFsYwT078965@fire.js.berklix.net> <200909231209.08346.jhb@freebsd.org> Message-ID: <4ABA5937.9000406@elischer.org> John Baldwin wrote: > On Wednesday 23 September 2009 11:54:34 am Julian H. Stacey wrote: >> Rui Paulo wrote: >>> On 22 Sep 2009, at 19:03, Nate Eldredge wrote: >>> >>>> On Tue, 22 Sep 2009, John Baldwin wrote: >>>> >>>>> My comment is to just use 4.x (seriously). A true 386 is going to >>>>> be quite >>>>> slow and the overhead of many things added that work well on newer >>>>> processors >>>>> is going to be very painful on a 386 (probably on a 486 as well). >>>>> 4.x runs >>>>> fine on a 386 and should support all the hardware you can stick >>>>> into a >>>>> machine with an 80386 CPU. >>>> Unless, of course, you plan to put it on a network. I doubt that >>>> 4.x is up to date with respect to security patches. >>> I don't know if they were all applied on 4.x, but I think at least the >>> older ones are. >> 4.11 fell out of security support some while back, but >> http://www.freebsd.org/security/index.html >> only lists what's still in, not what fell out when. >> >> Free/ Net/ Open/ Dragon etc all derive from Bill Jollitz port of >> BSD to 386. Would be nice if we could still keep that first platform >> walking, even if speed can't be called running ;-) >> >> Maybe I'll get time to chase down all that came before >> http://svn.freebsd.org/viewvc/base?view=revision&revision=137784 > > Other things added since then assume at least a 486. Not having cmpxchg is a > bit of a killer. I think a 386 can assume non-SMP in which case that can be simulated just fine :-) it also simplifies a lot of the other breakages.. #if (CPU == 80386) && defined(SMP) #error "can't have smp on a 386" #endif > The umtx stuff used by libthr assumes it can do a cmpxchg in > userland for example. One idea kicked around many years ago was catching the > illegal instruction faults for userland and emulating cmpxchg, but that would > be a good bit of work. FreeBSD now also makes liberal use of 'xadd' for > reference counts (see refcount_*()) so you would need to support that on a > 386 as well. There may be other places that I'm not aware of that have > similar assumptions. FWIW, I would probably not be in favor of putting any > patches into the tree if you do manage to get it all working. I suspect the > userbase of FreeBSD/80386 is even smaller than FreeBSD/alpha or > FreeBSD/sparc64 and 80386 support would add a lot of ugly #ifdef's for > miniscule gain. > From jhb at freebsd.org Wed Sep 23 18:38:00 2009 From: jhb at freebsd.org (John Baldwin) Date: Wed Sep 23 18:38:06 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: <4ABA5937.9000406@elischer.org> References: <200909231554.n8NFsYwT078965@fire.js.berklix.net> <200909231209.08346.jhb@freebsd.org> <4ABA5937.9000406@elischer.org> Message-ID: <200909231436.29466.jhb@freebsd.org> On Wednesday 23 September 2009 1:21:59 pm Julian Elischer wrote: > John Baldwin wrote: > > On Wednesday 23 September 2009 11:54:34 am Julian H. Stacey wrote: > >> Rui Paulo wrote: > >>> On 22 Sep 2009, at 19:03, Nate Eldredge wrote: > >>> > >>>> On Tue, 22 Sep 2009, John Baldwin wrote: > >>>> > >>>>> My comment is to just use 4.x (seriously). A true 386 is going to > >>>>> be quite > >>>>> slow and the overhead of many things added that work well on newer > >>>>> processors > >>>>> is going to be very painful on a 386 (probably on a 486 as well). > >>>>> 4.x runs > >>>>> fine on a 386 and should support all the hardware you can stick > >>>>> into a > >>>>> machine with an 80386 CPU. > >>>> Unless, of course, you plan to put it on a network. I doubt that > >>>> 4.x is up to date with respect to security patches. > >>> I don't know if they were all applied on 4.x, but I think at least the > >>> older ones are. > >> 4.11 fell out of security support some while back, but > >> http://www.freebsd.org/security/index.html > >> only lists what's still in, not what fell out when. > >> > >> Free/ Net/ Open/ Dragon etc all derive from Bill Jollitz port of > >> BSD to 386. Would be nice if we could still keep that first platform > >> walking, even if speed can't be called running ;-) > >> > >> Maybe I'll get time to chase down all that came before > >> http://svn.freebsd.org/viewvc/base?view=revision&revision=137784 > > > > Other things added since then assume at least a 486. Not having cmpxchg is a > > bit of a killer. > > I think a 386 can assume non-SMP in which case that can be simulated > just fine :-) > it also simplifies a lot of the other breakages.. > > #if (CPU == 80386) && defined(SMP) > #error "can't have smp on a 386" > #endif No, it actually does not. The in-kernel version of cmpset for 386 was to disable interrupts while doing a cmp and jmp around a mov (even 386's have preemption, so you do have to disable interrupts). You can't do that in userland (cli is a privileged instruction), which probably mandates doing a cmpxchg emulator in the kernel for userland code. That and disabling interrupts is actually far less efficient than spl() for a UP 80386 machine. I suspect newer kernels will run slower on an 80386 than 4.x. -- John Baldwin From nate at thatsmathematics.com Wed Sep 23 19:08:54 2009 From: nate at thatsmathematics.com (Nate Eldredge) Date: Wed Sep 23 19:09:01 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: <200909231436.29466.jhb@freebsd.org> References: <200909231554.n8NFsYwT078965@fire.js.berklix.net> <200909231209.08346.jhb@freebsd.org> <4ABA5937.9000406@elischer.org> <200909231436.29466.jhb@freebsd.org> Message-ID: On Wed, 23 Sep 2009, John Baldwin wrote: > On Wednesday 23 September 2009 1:21:59 pm Julian Elischer wrote: >> John Baldwin wrote: >>> Other things added since then assume at least a 486. Not having cmpxchg is a >>> bit of a killer. >> >> I think a 386 can assume non-SMP in which case that can be simulated >> just fine :-) >> it also simplifies a lot of the other breakages.. >> >> #if (CPU == 80386) && defined(SMP) >> #error "can't have smp on a 386" >> #endif > > No, it actually does not. The in-kernel version of cmpset for 386 was to > disable interrupts while doing a cmp and jmp around a mov (even 386's have > preemption, so you do have to disable interrupts). You can't do that in > userland (cli is a privileged instruction), which probably mandates doing a > cmpxchg emulator in the kernel for userland code. That and disabling > interrupts is actually far less efficient than spl() for a UP 80386 machine. > I suspect newer kernels will run slower on an 80386 than 4.x. Another issue that I know affected Linux is that the 386 would allow kernel code (CPL 0) to write to a page that was marked read-only. The 486 and later would generate a page fault. Linux takes advantage of the 486 behavior to avoid having to do explicit access checks when copying to user space, though AFAIK it checks the CPU at boot time to decide if this can be done. I haven't checked whether FreeBSD uses this feature, but it would be another thing to watch out for. -- Nate Eldredge nate@thatsmathematics.com From tom at tomjudge.com Wed Sep 23 19:47:46 2009 From: tom at tomjudge.com (Tom Judge) Date: Wed Sep 23 19:47:57 2009 Subject: Adding support for the Intel SS4000-E NAS aka the EM-7210 Message-ID: <4ABA76B7.5050008@tomjudge.com> Hi, Sorry for the cross post but i didn't get any bites on arm@ so I am gonna try my luck here. I am trying to add support for the Intel SS4000-E/EM7210 to FreeBSD, I have copied all of the files for the EP80219 as this seems to be what the board is based on, and modified the interrupt assignment code. See the patch agains sys/arm here: http://www.tomjudge.com/tmp/em-7210-patch I am also having trouble with loading the kernel at the default location, (instructions from here: http://wiki.freebsd.org/FreeBSDAvila) as RedBoot reports that there is no memory at the load address. Currently when I try to load this kernel at the phys address using reboot and try to run the kernel I get nothing outputed on the console. I used the phys address from the addresses that the linux kernel is loaded into out of flash. (see output below) Could someone please give me some hints on what I am doing wrong and need to change? The board is currently supported by the Linux kernel (as the em7210.c) so I was hoping that adding support to FreeBSD would not be difficult. More information about the system can be found here: http://em7210.kwaak.net/ Thanks in advance for any help, Tom > $ sudo cu -l cuau0 -s 115200 Password: Connected +No network interfaces found EM-7210 ver.T04 2005-12-12 (For ver.AA) == Executing boot script in 1.000 seconds - enter ^C to abort ^C RedBoot> ^C RedBoot> fis load rammode RedBoot> go +Ethernet eth0: MAC address 00:0e:0c:b6:bf:1a IP: 10.9.9.1/255.255.255.0, Gateway: 10.9.9.1 Default server: 10.9.9.10, DNS server IP: 0.0.0.0 EM-7210 (RAM mode) 2005-12-22 == Executing boot script in 1.000 seconds - enter ^C to abort ^C RedBoot> ^C RedBoot> fis list Name FLASH addr Mem addr Length Entry point RedBoot 0xF0000000 0xF0000000 0x00040000 0x00000000 RedBoot config 0xF1FC0000 0xF1FC0000 0x00001000 0x00000000 FIS directory 0xF1FE0000 0xF1FE0000 0x00020000 0x00000000 rammode 0xF0060000 0x00200000 0x00040000 0x00200000 log 0xF0040000 0xF0040000 0x00020000 0x00000000 naskey 0xF00A0000 0xF00A0000 0x00020000 0x01008000 zImage 0xF00C0000 0x01008000 0x00200000 0x01008000 ramdisk.gz 0xF02C0000 0x01800000 0x00400000 0x01800000 vendor 0xF06C0000 0xF06C0000 0x01880000 0x01800000 wmdata 0xF1F40000 0xF1F40000 0x00080000 0x01800000 RedBoot> From linimon at lonesome.com Wed Sep 23 19:35:09 2009 From: linimon at lonesome.com (Mark Linimon) Date: Wed Sep 23 20:03:57 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: <200909231554.n8NFsYwT078965@fire.js.berklix.net> References: <5311D83C-0DB0-4D10-B2AB-B61FD37178F7@gmail.com> <200909231554.n8NFsYwT078965@fire.js.berklix.net> Message-ID: <20090923191822.GA11443@lonesome.com> On Wed, Sep 23, 2009 at 05:54:34PM +0200, Julian H. Stacey wrote: > 4.11 fell out of security support some while back, but > http://www.freebsd.org/security/index.html > only lists what's still in, not what fell out when. Then see http://people.freebsd.org/~linimon/schedule/milestones.html. (Yes, I know the data for 7.2 and 8.0 are stale.) 4.11 support was extended again and again but ended 01/31/2007. Towards the end it was consuming a lot of people's time to support it, since everything newer had changed dramatically. > Free/ Net/ Open/ Dragon etc all derive from Bill Jollitz port of > BSD to 386. Would be nice if we could still keep that first platform > walking, even if speed can't be called running ;-) The same comment applies. Everything has changed dramatically. > Maybe I'll get time to chase down all that came before > http://svn.freebsd.org/viewvc/base?view=revision&revision=137784 I honestly can't see why you would want to waste your time like this, but it's yours to waste I suppose. (Even a notorious packrat like me has gotten rid of hardware from that era.) mcl From tom at tomjudge.com Wed Sep 23 22:45:27 2009 From: tom at tomjudge.com (Tom Judge) Date: Wed Sep 23 22:45:34 2009 Subject: USB Device identification in dmesg and usbconfig Message-ID: <4ABAA4E6.20809@tomjudge.com> Hi, I have been working on getting at least some support for the Function (F1-12) keys on my MS Natural 4000 keyboard. Here is the original PR on the subject: usb/116947. My patch can be found here: http://svn.tomjudge.com/freebsd/patches/ms-natural-4000/usb-natural4000.patch and I have submitted an update to the PR. When I reboot into the kernel the quirk is detected correctly and the function keys work. However the device does not seem to be correctly identified here is the dmesg output: ugen2.3: at usbus2 ukbd0: on usbus2 kbd2 at ukbd0 uhid0: on usbus2 Here is usbconfig list output: ugen2.3: at usbus2, cfg=0 md=HOST spd=LOW (1.5Mbps) pwr=ON How do I get the output to match other devices like this: ugen2.4: at usbus2 ums0: on usbus2 ums0: 3 buttons and [XYZ] coordinates ID=17 Thanks Tom From bruce at cran.org.uk Wed Sep 23 23:52:10 2009 From: bruce at cran.org.uk (Bruce Cran) Date: Wed Sep 23 23:52:17 2009 Subject: USB Device identification in dmesg and usbconfig In-Reply-To: <4ABAA4E6.20809@tomjudge.com> References: <4ABAA4E6.20809@tomjudge.com> Message-ID: <20090924003351.000056a3@unknown> On Wed, 23 Sep 2009 22:44:54 +0000 Tom Judge wrote: > Hi, > > I have been working on getting at least some support for the Function > (F1-12) keys on my MS Natural 4000 keyboard. Here is the original PR > on the subject: usb/116947. My patch can be found here: > http://svn.tomjudge.com/freebsd/patches/ms-natural-4000/usb-natural4000.patch > and I have submitted an update to the PR. > > > When I reboot into the kernel the quirk is detected correctly and the > function keys work. > > However the device does not seem to be correctly identified here is > the dmesg output: > > ugen2.3: at usbus2 > ukbd0: 3> on usbus2 > kbd2 at ukbd0 > uhid0: 3> on usbus2 > > Here is usbconfig list output: > > ugen2.3: at usbus2, cfg=0 md=HOST > spd=LOW (1.5Mbps) pwr=ON > > > How do I get the output to match other devices like this: > > ugen2.4: at usbus2 > ums0: 2.00/1.20, addr 4> on usbus2 > ums0: 3 buttons and [XYZ] coordinates ID=17 I'm starting to suspect this is a bug in the USB code that Microsoft devices use. I've seen this on two PCs now, both on 7.x and 8.0-RC1; sometimes they'll identify properly by getting the strings out of the device (e.g. "Microsoft 3-Button Mouse with IntelliEye(TM)") but most of the time I'll just see the generic device and product IDs. -- Bruce Cran From alexbestms at math.uni-muenster.de Thu Sep 24 09:28:34 2009 From: alexbestms at math.uni-muenster.de (Alexander Best) Date: Thu Sep 24 09:28:42 2009 Subject: HAMMER FS port (status ?) Message-ID: i remember a discussion about HAMMER support on one of the mailingslists which sorta ended with the following statement: "let's get zfs running properly before we even think about starting with HAMMER." cheers. alex From des at des.no Thu Sep 24 10:49:40 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Thu Sep 24 10:49:46 2009 Subject: HAMMER FS port (status ?) In-Reply-To: (Alexander Best's message of "Thu, 24 Sep 2009 11:28:31 +0200 (CEST)") References: Message-ID: <86fxac62il.fsf@ds4.des.no> Alexander Best writes: > i remember a discussion about HAMMER support on one of the mailingslists which > sorta ended with the following statement: > > "let's get zfs running properly before we even think about starting with > HAMMER." Not a valid argument; regardless of the state of ZFS, one does not preclude the other unless you expect the same person to handle both. The one and only reason why HAMMER is not in the base system is that nobody has stepped forward to do it. DES -- Dag-Erling Sm?rgrav - des@des.no From jhs at berklix.com Thu Sep 24 11:07:33 2009 From: jhs at berklix.com (Julian H. Stacey) Date: Thu Sep 24 11:07:41 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: Your message "Wed, 23 Sep 2009 14:18:22 CDT." <20090923191822.GA11443@lonesome.com> Message-ID: <200909241109.n8OB98ww092018@fire.js.berklix.net> > I honestly can't see why you would want to waste your time like this, > but it's yours to waste I suppose. (Even a notorious packrat like me > has gotten rid of hardware from that era.) > > mcl Hmm, So that's you & jhb warning me off. Well I do have a ToDo list that's a mile long, so maybe I'd best take your advice :-) Thanks all though, for the informed kernel/ assembler comment we've been reading. Cheers, Julian -- Julian Stacey: BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Mail ASCII plain text not HTML & Base64. http://asciiribbon.org Virused Microsoft PCs cause spam. http://berklix.com/free/ From alexbestms at math.uni-muenster.de Thu Sep 24 15:09:26 2009 From: alexbestms at math.uni-muenster.de (Alexander Best) Date: Thu Sep 24 15:09:34 2009 Subject: HAMMER FS port (status ?) In-Reply-To: <86fxac62il.fsf@ds4.des.no> Message-ID: Dag-Erling Sm?rgrav schrieb am 2009-09-24: > Alexander Best writes: > > i remember a discussion about HAMMER support on one of the > > mailingslists which > > sorta ended with the following statement: > > "let's get zfs running properly before we even think about starting > > with > > HAMMER." > Not a valid argument; regardless of the state of ZFS, one does not > preclude the other unless you expect the same person to handle both. > The one and only reason why HAMMER is not in the base system is that > nobody has stepped forward to do it. > DES my thoughts exactly. i'd really like to use HAMMER as a ufs2 replacement on my box. zfs seems like a great fs but to me it seems too complicated to be handled by the average user. i'm really looking forward to the first few lines of code in p4. ;) cheers. alex From danger at FreeBSD.org Thu Sep 24 15:32:15 2009 From: danger at FreeBSD.org (Daniel Gerzo) Date: Thu Sep 24 15:32:52 2009 Subject: HEADSUP: Call for FreeBSD Status Reports Message-ID: <4ABB8B31.7050505@FreeBSD.org> Dear all, I would like to remind you to submit your status reports as soon as possible. Long time has passed since the last status reports were released; and surely a lot has had happened since then. Our developers are relaxed after DevSummit and EuroBSDCon in Cambridge, which both were great! I believe a lot of stuff has been discussed during these events (I hope we will have reports covering this too) and since the last report a lot of things have happened. During that time, two other conferences have been held (BSDCan and AsiaBSDCon), we have released 7.2, not to mention that 8.0 is behind the door. Google Summer of Code should be finished by now too, and we would like to hear about its results. Surely there are a lot more projects which are currently being worked on, so please do not hesitate and write us a few lines - a short description about what you are working on, what are the plans and goals, so we can inform our community about your great work! It's useful for you as well as our users! Please note, the submissions for this quarter (well...rather halfyear, because we should now cover 4-9/2009) are due by October 7th, 2009. Please post the filled-in XML template to be found at http://www.freebsd.org/news/status/report-sample.xml to monthly@FreeBSD.org, or alternatively use our web based form at http://www.freebsd.org/cgi/monthly.cgi. We are looking forward to see your submissions! -- S pozdravom / Best regards Daniel Gerzo, FreeBSD committer From gnemmi at gmail.com Thu Sep 24 15:38:20 2009 From: gnemmi at gmail.com (Gonzalo Nemmi) Date: Thu Sep 24 15:38:27 2009 Subject: HAMMER FS port (status ?) In-Reply-To: References: Message-ID: <200909241238.15457.gnemmi@gmail.com> On Thursday 24 September 2009 6:28:31 am Alexander Best wrote: > i remember a discussion about HAMMER support on one of the > mailingslists which sorta ended with the following statement: > > "let's get zfs running properly before we even think about starting > with HAMMER." > > cheers. > alex > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" Yup .. that was basically it ... Here you go: http://lists.freebsd.org/pipermail/freebsd-stable/2008-October/045550.html Regards -- Blessings Gonzalo Nemmi From kraduk at googlemail.com Thu Sep 24 16:16:28 2009 From: kraduk at googlemail.com (krad) Date: Thu Sep 24 16:41:43 2009 Subject: HAMMER FS port (status ?) In-Reply-To: <200909241238.15457.gnemmi@gmail.com> References: <200909241238.15457.gnemmi@gmail.com> Message-ID: 2009/9/24 Gonzalo Nemmi > On Thursday 24 September 2009 6:28:31 am Alexander Best wrote: > > i remember a discussion about HAMMER support on one of the > > mailingslists which sorta ended with the following statement: > > > > "let's get zfs running properly before we even think about starting > > with HAMMER." > > > > cheers. > > alex > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to > > "freebsd-hackers-unsubscribe@freebsd.org" > > Yup .. that was basically it ... > Here you go: > http://lists.freebsd.org/pipermail/freebsd-stable/2008-October/045550.html > > Regards > -- > Blessings > Gonzalo Nemmi > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > If the istallation get sorted out for zfs its argubly the easiest fs I have ever had to manage. There are lots of options but you dont have to use them and if you dont it wont hurt the average user. If you do decide to start using them at a later date, then its dead easy to. THe main problem with zfs is resources. If it wasn't for this then there wouldn't be a good reason not to use zfs default fs (i await the flames 8) From leandro.magnabosco at fcdl-sc.org.br Thu Sep 24 16:54:26 2009 From: leandro.magnabosco at fcdl-sc.org.br (Leandro Quibem Magnabosco) Date: Thu Sep 24 16:54:33 2009 Subject: HAMMER FS port (status ?) In-Reply-To: <200909241238.15457.gnemmi@gmail.com> References: <200909241238.15457.gnemmi@gmail.com> Message-ID: <4ABB9FC9.3060006@fcdl-sc.org.br> I think that one questions pops into the minds of a lot of people right now: Why not just use DragonFly BSD? It is a pretty decent system. Why do you need it to be FreeBSD w/ Hammer and not DragonFly BSD? Maybe there are some reasons, but I don't see it. Could anybody point it out for me? Thank you. -- Leandro Quibem Magnabosco. From luizgustavo at luizgustavo.pro.br Thu Sep 24 20:03:45 2009 From: luizgustavo at luizgustavo.pro.br (Luiz Gustavo S. Costa) Date: Thu Sep 24 20:03:52 2009 Subject: HAMMER FS port (status ?) In-Reply-To: <4ABB9FC9.3060006@fcdl-sc.org.br> References: <200909241238.15457.gnemmi@gmail.com> <4ABB9FC9.3060006@fcdl-sc.org.br> Message-ID: <772ca7d0909241303q3ed9986fn66a79134fb9a417e@mail.gmail.com> Hi.... 2009/9/24 Leandro Quibem Magnabosco : > I think that one questions pops into the minds of a lot of people right now: > Why not just use DragonFly BSD? > > It is a pretty decent system. > Why do you need it to be FreeBSD w/ Hammer and not DragonFly BSD? exist very differences between FreeBSD and DragonFlyBSD.... DragonFly = fork from freebsd 4.x and other line of develop and ports/pksrc, and, and, and.... > > > Maybe there are some reasons, but I don't see it. > Could anybody point it out for me? > i use HAMMER on one freebsd > > Thank you. > -- > Leandro Quibem Magnabosco. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > -- Luiz Gustavo Costa (Powered by BSD) *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ mundoUnix - Consultoria em Software Livre http://www.mundounix.com.br ICQ: 2890831 / MSN: contato@mundounix.com.br From linimon at lonesome.com Thu Sep 24 20:35:10 2009 From: linimon at lonesome.com (Mark Linimon) Date: Thu Sep 24 20:42:16 2009 Subject: HAMMER FS port (status ?) In-Reply-To: <4ABB9FC9.3060006@fcdl-sc.org.br> References: <200909241238.15457.gnemmi@gmail.com> <4ABB9FC9.3060006@fcdl-sc.org.br> Message-ID: <20090924201558.GA12560@lonesome.com> On Thu, Sep 24, 2009 at 01:35:21PM -0300, Leandro Quibem Magnabosco wrote: > I think that one questions pops into the minds of a lot of people right > now: Why not just use DragonFly BSD? Feel free, but take it off-list, please. mcl From leandro.magnabosco at fcdl-sc.org.br Thu Sep 24 20:42:27 2009 From: leandro.magnabosco at fcdl-sc.org.br (Leandro Quibem Magnabosco) Date: Thu Sep 24 20:42:34 2009 Subject: HAMMER FS port (status ?) In-Reply-To: <20090924201558.GA12560@lonesome.com> References: <200909241238.15457.gnemmi@gmail.com> <4ABB9FC9.3060006@fcdl-sc.org.br> <20090924201558.GA12560@lonesome.com> Message-ID: <4ABBD9B1.2000802@fcdl-sc.org.br> Mark Linimon escreveu: > On Thu, Sep 24, 2009 at 01:35:21PM -0300, Leandro Quibem Magnabosco wrote: > >> I think that one questions pops into the minds of a lot of people right >> now: Why not just use DragonFly BSD? >> > > Feel free, but take it off-list, please. > > mcl > We (me and Luiz) did that already. Thanks and sorry. :) From stef-list at memberwebs.com Thu Sep 24 20:45:30 2009 From: stef-list at memberwebs.com (Stef Walter) Date: Thu Sep 24 20:45:36 2009 Subject: Is the FreeBSD ABI compatibility policy documented anywhere Message-ID: <4ABBD5FA.5070507@memberwebs.com> It seems that FreeBSD has an ABI compatibility policy where major versions remain ABI and API compatible throughout minor point versions. That is to say that the kernel interfaces and libraries for (eg) 7-STABLE, 7.1-RELEASE, 7.2-RELEASE are not supposed to change. Is this a policy of the project? If so, is it documented anywhere? Or is it just a convention? Cheers, Stef From julian at elischer.org Thu Sep 24 21:00:02 2009 From: julian at elischer.org (Julian Elischer) Date: Thu Sep 24 21:00:09 2009 Subject: Is the FreeBSD ABI compatibility policy documented anywhere In-Reply-To: <4ABBD5FA.5070507@memberwebs.com> References: <4ABBD5FA.5070507@memberwebs.com> Message-ID: <4ABBDDD4.50905@elischer.org> Stef Walter wrote: > It seems that FreeBSD has an ABI compatibility policy where major > versions remain ABI and API compatible throughout minor point versions. > That is to say that the kernel interfaces and libraries for (eg) > 7-STABLE, 7.1-RELEASE, 7.2-RELEASE are not supposed to change. > > Is this a policy of the project? If so, is it documented anywhere? Or is > it just a convention? It is a policy of the project but I don't think our policies are written down as such. I think you will find it referenced in many places in a sideways manner rather than directly. Possibly in the developer handbook > > Cheers, > > Stef > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From kris at FreeBSD.org Thu Sep 24 22:36:04 2009 From: kris at FreeBSD.org (Kris Kennaway) Date: Thu Sep 24 22:36:11 2009 Subject: genuine cpu I386_CPU kernel support In-Reply-To: <4ABA5937.9000406@elischer.org> References: <200909231554.n8NFsYwT078965@fire.js.berklix.net> <200909231209.08346.jhb@freebsd.org> <4ABA5937.9000406@elischer.org> Message-ID: <4ABBF457.8010000@FreeBSD.org> Julian Elischer wrote: > I think a 386 can assume non-SMP in which case that can be simulated > just fine :-) > it also simplifies a lot of the other breakages.. > > #if (CPU == 80386) && defined(SMP) > #error "can't have smp on a 386" > #endif Paging Terry Lambert...Terry Lambert, to the hackers lounge please. Kris From fabio at freebsd.org Thu Sep 24 23:46:14 2009 From: fabio at freebsd.org (Fabio Checconi) Date: Thu Sep 24 23:46:20 2009 Subject: sx locks and memory barriers Message-ID: <20090924224935.GW473@gandalf.sssup.it> Hi all, looking at sys/sx.h I have some troubles understanding this comment: * A note about memory barriers. Exclusive locks need to use the same * memory barriers as mutexes: _acq when acquiring an exclusive lock * and _rel when releasing an exclusive lock. On the other side, * shared lock needs to use an _acq barrier when acquiring the lock * but, since they don't update any locked data, no memory barrier is * needed when releasing a shared lock. In particular, I'm not understanding what prevents the following sequence from happening: CPU A CPU B sx_slock(&data->lock); sx_sunlock(&data->lock); /* reordered after the unlock by the cpu */ if (data->buffer) sx_xlock(&data->lock); free(data->buffer); data->buffer = NULL; sx_xunlock(&data->lock); a = *data->buffer; IOW, even if readers do not modify the data protected by the lock, without a release barrier a memory access may leak past the unlock (as the cpu won't notice any dependency between the unlock and the fetch, feeling free to reorder them), thus potentially racing with an exclusive writer accessing the data. On architectures where atomic ops serialize memory accesses this would never happen, otherwise the sequence above seems possible; am I missing something? From bruce at cran.org.uk Fri Sep 25 01:37:12 2009 From: bruce at cran.org.uk (Bruce Cran) Date: Fri Sep 25 01:37:19 2009 Subject: Is the FreeBSD ABI compatibility policy documented anywhere In-Reply-To: <4ABBDDD4.50905@elischer.org> References: <4ABBD5FA.5070507@memberwebs.com> <4ABBDDD4.50905@elischer.org> Message-ID: <20090925023711.0000515b@unknown> On Thu, 24 Sep 2009 14:00:04 -0700 Julian Elischer wrote: > Stef Walter wrote: > > It seems that FreeBSD has an ABI compatibility policy where major > > versions remain ABI and API compatible throughout minor point > > versions. That is to say that the kernel interfaces and libraries > > for (eg) 7-STABLE, 7.1-RELEASE, 7.2-RELEASE are not supposed to > > change. > > > > Is this a policy of the project? If so, is it documented anywhere? > > Or is it just a convention? > > It is a policy of the project but I don't think our policies are > written down as such. I think you will find it referenced in > many places in a sideways manner rather than directly. > > Possibly in the developer handbook The only place I found it directly referenced was in http://wiki.freebsd.org/VendorInformation -- Bruce Cran From luizgustavo at luizgustavo.pro.br Fri Sep 25 02:42:49 2009 From: luizgustavo at luizgustavo.pro.br (Luiz Gustavo S. Costa) Date: Fri Sep 25 02:42:56 2009 Subject: altq over vlan: patch exists ? Message-ID: <772ca7d0909241942n5ce78cc9sd9855bdd4c1e9c26@mail.gmail.com> Hi guys, The configuration Altq on one interface VLAN is working on OpenBSD and DragonFlyBSD, but FreeBSD no ! exists any patch for this ? or .. why no working ? any reason ? thanx -- Luiz Gustavo Costa (Powered by BSD) *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ mundoUnix - Consultoria em Software Livre http://www.mundounix.com.br ICQ: 2890831 / MSN: contato@mundounix.com.br From rnoland at FreeBSD.org Fri Sep 25 11:47:46 2009 From: rnoland at FreeBSD.org (Robert Noland) Date: Fri Sep 25 11:48:20 2009 Subject: sx locks and memory barriers In-Reply-To: <20090924224935.GW473@gandalf.sssup.it> References: <20090924224935.GW473@gandalf.sssup.it> Message-ID: <1253877997.2031.2627.camel@balrog.2hip.net> On Fri, 2009-09-25 at 00:49 +0200, Fabio Checconi wrote: > Hi all, > looking at sys/sx.h I have some troubles understanding this comment: > > * A note about memory barriers. Exclusive locks need to use the same > * memory barriers as mutexes: _acq when acquiring an exclusive lock > * and _rel when releasing an exclusive lock. On the other side, > * shared lock needs to use an _acq barrier when acquiring the lock > * but, since they don't update any locked data, no memory barrier is > * needed when releasing a shared lock. > > In particular, I'm not understanding what prevents the following sequence > from happening: > > CPU A CPU B > > sx_slock(&data->lock); > > sx_sunlock(&data->lock); > > /* reordered after the unlock > by the cpu */ > if (data->buffer) > sx_xlock(&data->lock); > free(data->buffer); > data->buffer = NULL; > sx_xunlock(&data->lock); > > a = *data->buffer; > > IOW, even if readers do not modify the data protected by the lock, > without a release barrier a memory access may leak past the unlock (as > the cpu won't notice any dependency between the unlock and the fetch, > feeling free to reorder them), thus potentially racing with an exclusive > writer accessing the data. Maybe I am missing something suttle, but shouldn't the shared lock be held for all data access if you want to guarantee sanity? Meaning, if you are accessing data->* without any locks held, all bets are off. robert. > On architectures where atomic ops serialize memory accesses this would > never happen, otherwise the sequence above seems possible; am I missing > something? > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" -- Robert Noland FreeBSD From rysto32 at gmail.com Fri Sep 25 13:57:02 2009 From: rysto32 at gmail.com (Ryan Stone) Date: Fri Sep 25 13:57:09 2009 Subject: sx locks and memory barriers In-Reply-To: <1253877997.2031.2627.camel@balrog.2hip.net> References: <20090924224935.GW473@gandalf.sssup.it> <1253877997.2031.2627.camel@balrog.2hip.net> Message-ID: The code that Fabio proposes looks like this: sx_slock(&data->lock); if (data->buffer) a = *data->buffer; sx_sunlock(&data->lock); This point is that without a memory barrier on the unlock, the CPU is free to reorder the instructions into the order is his message. From ed at 80386.nl Fri Sep 25 14:28:53 2009 From: ed at 80386.nl (Ed Schouten) Date: Fri Sep 25 14:29:00 2009 Subject: Testers wanted: xterm-style emulator! Message-ID: <20090925142852.GR95398@hoeg.nl> Hi folks, I just committed a small patch for the Syscons terminal emulator that allows you all to test an xterm-style terminal emulator without requiring any recompilation of your kernel (just make sure you run HEAD at r197481 or later). I am considering making the xterm-style emulator the default somewhere in the future, because it has the following advantages: - Even though a larger set of instructions is a pain to implement, it does reduce bandwidth. When you use the xterm-style emulator, applications can use things like scrolling regions to scroll selected parts of the screen. This means that applications like screen(1), minicom(1), vi(1) (read: apps with status lines at the top/bottom) don't need to generate massive amounts of data each time you need to scroll. - Because 99% of all graphical terminal emulators use xterm-style emulation as well, you can finally use tools like dtach(1) between the console and X11 without any problems. dtach(1) doesn't perform any terminal emulation. It just forwards data. - You can finally SSH/telnet/rlogin/cu/etc. to devices such as switches and other operating systems without getting artifacts or termcap issues. - It makes it easier for us to eventually get Unicode working. cons25 and Unicode is hard, because at least our termcap entry uses things like 8-bit CP437 box drawing (ACS). There are still some small things broken with the xterm-style emulator, but it shouldn't be too bad. I've been using it for more than half a year or so. Known issues are: - The cursor keys, F1 to F12, insert, delete, home, end, page up, page down, etc. may not always work as expected. I'll look into this soon. - Box drawing *should* work the way it did before, but if you load different fonts, it may display the incorrect glyphs. I don't consider this to be a real bug, because this problem also exists when using cons25. How to help out: - Make sure you run FreeBSD HEAD r197481 or later. - Log in on the console. - Run the following commands: printf '\033[=T' export TERM=xterm - Just do the stuff you normally do and report any rendering issues that show up. Please give a detailed explanation of the programs you ran and what you had to do to trigger the issue. You can also use applications like tee(1) to capture display output. - If you want to stop testing: printf '\033[=1T' export TERM=cons25 You can also activate the xterm-style emulation by default. All you need to do, is compile your kernel with options TEKEN_XTERM set. Be sure to update your /etc/ttys to list xterm instead of cons25. Thanks! -- Ed Schouten WWW: http://80386.nl/ From rnoland at FreeBSD.org Fri Sep 25 14:40:29 2009 From: rnoland at FreeBSD.org (Robert Noland) Date: Fri Sep 25 14:40:36 2009 Subject: sx locks and memory barriers In-Reply-To: References: <20090924224935.GW473@gandalf.sssup.it> <1253877997.2031.2627.camel@balrog.2hip.net> Message-ID: <1253889620.2065.12.camel@balrog.2hip.net> On Fri, 2009-09-25 at 09:30 -0400, Ryan Stone wrote: > The code that Fabio proposes looks like this: > > sx_slock(&data->lock); > if (data->buffer) > a = *data->buffer; > sx_sunlock(&data->lock); > > > This point is that without a memory barrier on the unlock, the CPU is > free to reorder the instructions into the order is his message. Ok, then I will sit back and wait for someone with more clue to respond... robert. -- Robert Noland FreeBSD From gabor at FreeBSD.org Sun Sep 27 13:27:09 2009 From: gabor at FreeBSD.org (Gabor Kovesdan) Date: Sun Sep 27 13:27:17 2009 Subject: BSDL texttools status and further thoughts... Message-ID: <4ABF6824.9090601@FreeBSD.org> Hello all, recently, I've had a discussion with rdivacky@ about the status of these tools. It's about bc, dc, grep, sort and iconv. He has persuaded me to write a summary here in case someone else is interested in contributing to these tools. So here I come with a little summary. BSD bc/dc will come just after 8.0-RELEASE. They are quite mature and delphij@ offered to help me getting this into the three by reviewing and approving my changes (I only have doc/ports bit). BSD grep is also quite mature, I've fixed the last critical bug recently. My only concern is the performance. GNU is fast but has ~8 KSLOC. BSD grep is slightly slower but has only ~1.5 KSLOC. It's a huge difference in complexity and GNU grep is very hard to read but they use a lot of custom optimizations to get this performance. I think we should go another way and have a well-optimized and mature regex library. The current one is very old and doesn't have wchar support, it's slow like hell and doesn't support custom GNU bullshit, which is unfortunately necessary to maintain compatiblity. (e.g. "(a|)" is considered invalid in strict POSIX regex but GNU accepts it!) Because of this, BSD grep is linked to the GNU regex library at the moment but because of the custom magic in grep it's still slower a bit. If we can live with this slight performance hit, we can commit it, I think because it's quite feature-complete. You know, I'm a beginner but I think that the code of BSD grep is so tiny and simple that there are almost absolutely no ways to optimize it more by simplifying the code, so I think further optimization should be done in the regex library. As for the regex library, NetBSD's SoC project is worth a look. I'm interested in this but I have too much things in the queue to start another one... As for sort, it isn't so mature yet. I've just made a TODO list of the known missing features or bugs: - sometimes it segfaults when reading huge files - the -k option isn't implemented yet - the -n option doesn't work correctly - preproc() optimization (I don't what it refers to actually but I had it on my previous TODO list, will have to check) - polishing man page - adding some more test cases to the regression test - checking performance (in this case, it really matters because sorting is an algorithmic piece of cake and I'm not an algorithmic guru... And this version of sort was written by me from scratch. The OpenBSD-one isn't wchar-clean and can't be fixed by design. This sort is much more tiny but it seems the algorithm isn't optimal.) As for iconv, I'll keep working on it in my BSc thesis. The forward (foo -> utf32) conversions are almost completely GNU-compatible, the reverse ones not so much. GNU has an optional transliteration, while BSD iconv uses it at default so I compared the output to GNU's transliterated output and it has some more advanced mappings to do this. Apart from this, almost all encodings are supported, that we have in locale(1) charmaps but the Big5 module segfaults. I hope I'll be able to solve these issues and check performance as part of my BSc thesis. Regards, -- Gabor Kovesdan FreeBSD Volunteer EMAIL: gabor@FreeBSD.org .:|:. gabor@kovesdan.org WEB: http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org From yuri at rawbw.com Sun Sep 27 23:30:53 2009 From: yuri at rawbw.com (Yuri) Date: Sun Sep 27 23:30:59 2009 Subject: 72-STABLE compilation errors Message-ID: <4ABFF5AA.70303@rawbw.com> I updated the source today (RELENG_7) and got the errors, see below. I compile on i386 platform. Yuri ---- errors ---- cc -I/usr/local/include -DIN_GCC -DHAVE_CONFIG_H -DPREFIX=\"/usr/obj/usr/src/tmp/usr\" -I/usr/obj/usr/src/tmp/usr/src/gnu/usr.bin/cc/cc_int/../cc_tools -I/usr/src/gnu/usr.bin/cc/cc_int/../cc_tools -I/usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcc -I/usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcc/config -I/usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcclibs/include -I/usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcclibs/libcpp/include -I/usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcclibs/libdecnumber -I/usr/obj/usr/src/tmp/legacy/usr/include -c /usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcc/config/i386/i386.c /usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcc/config/i386/i386.c: In function 'ix86_function_regparm': /usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcc/config/i386/i386.c:2502: warning: initialization makes pointer from integer without a cast /usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcc/config/i386/i386.c:2503: error: dereferencing pointer to incomplete type /usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcc/config/i386/i386.c: In function 'ix86_function_sseregparm': /usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcc/config/i386/i386.c:2581: warning: initialization makes pointer from integer without a cast /usr/src/gnu/usr.bin/cc/cc_int/../../../../contrib/gcc/config/i386/i386.c:2582: error: dereferencing pointer to incomplete type *** Error code 1 Stop in /usr/src/gnu/usr.bin/cc/cc_int. *** Error code 1 From forensec at yahoo.de Mon Sep 28 16:51:23 2009 From: forensec at yahoo.de (Leunam Elebek) Date: Mon Sep 28 16:51:30 2009 Subject: Trouble with copyout, memcpy.... Message-ID: <112995.77056.qm@web28510.mail.ukl.yahoo.com> Hey list,I currently code a driver under Current 8.0 for Current 8.0.But there are some problems with kernel/user-space interaction.I've the following structure:struct daq_kitinfo {? ? ? ? uint32_t ki_maxdata;? ? ? ? uint32_t ki_flags;? ? ? ? uint32_t ki_rng_type;? ? ? ? int ? ? ? ? ? ki_type;? ? ? ? int ? ? ? ? ? ki_nchan;? ? ? ? int ? ? ? ? ? ki_len_chanl;};The above structure is used in my user-space app:int main(void) {? ? ? ? struct daq_kitinfo *info;? ? ? ? struct daq_kit kit; ? ? ? ? int fd, size; ? ? ? ? ...? ? ? ? ...? ? ? ? ...? ? ? ? /* ? ? ? ? ?* At this point I'll try to alloc memory. Notice that ? ? ? ? ?* the size i dependet from another struct entry.? ? ? ? ?*/? ? ? ? size = sizeof(*info) * kit.k_nkits;? ? ? ? info = malloc(size);? ? ? ? if (info == NULL)? ? ? ? ? ? ? ? exit(ENOMEM);? ? ? ??? ? ? ? /*? ? ? ? ?* The next step is to call the drivers ioctl() interface? ? ? ? ?* (the reason for that is described below).? ? ? ? ?*/? ? ? ? if (ioctl(fd, DAQ_KITINFO, info)) {? ? ? ? ? ? ? ? printf("errno: %d\n", errno); ? ? ? ? ? ? ? ? ?exit(errno);? ? ? ? }? ? ? ? ?printf("[ki_nchan] %d\n", ki_nchan);? ? ? ? ?...? ? ? ? ?...? ? ? ? ?return (0);}and inside the driver (put it simply):static intmy_ioctl(struct cdev *dev, u_long cmd, caddr_t arg, int flags,? ? struct thread *td){? ? ? ? struct daq_kitinfo *info;? ? ? ? struct daq_kit = dev->si_drv1; ? ? ? ? int size; ? ? ? ? ...? ? ? ? /* Do something useful e.g mutex'ing... */? ? ? ? ...? ? ? ? /* The same as in user-space... */? ? ? ? size = sizeof(*info) * kit.k_nkits;? ? ? ? info = malloc(sz, M_DAQ, M_NOWAIT | M_ZERO);? ? ? ? if (info == NULL)? ? ? ? ? ? ? ? ....? ? ? ? /*? ? ? ? ?* Here I want to copy struct info from kernel to user-space.? ? ? ? ?* If i use memcpy, the result is that the system hangs? ? ? ? ?* and I need to reboot the machine. OK, I thought? ? ? ? ?* copyout() should be able to do the job for me...? ? ? ? ?*/? ? ? ? ?if (copyout(info, arg, sz))? ? ? ? ? ? ? ? ?/*? ? ? ? ? ? ? ? ? * Fuc[k-k] i still come inside this block. I always? ? ? ? ? ? ? ? ? * get an EFAULT error. ? ? ? ? ? ? ? ? ? */}I really don't know what I should do to let the driver workingproperly. The driver should grap some informations/attributes,and fill up the info structure, so we can copy the filled info structto the user's app.I hope somebody can help me to resolve that problem.Ah, the corresponding ioctl is:#define GRP ? ? ? ? ? ? ? ? ? ? ? ?'d'#define DAQ_KITINFO ? ? ? ?_IOR(GRP, 3, struct daq_kitinfo)Thanks for attention and greatz from germanyMG From forensec at yahoo.de Mon Sep 28 17:19:09 2009 From: forensec at yahoo.de (Leunam Elebek) Date: Mon Sep 28 17:19:16 2009 Subject: Trouble with copyout, memcpy.... Plain-Text version =) Message-ID: <389605.70197.qm@web28503.mail.ukl.yahoo.com> Hey list, I currently code a driver under Current 8.0 for Current 8.0. But there are some problems with kernel/user-space interaction. I've the following structure: struct daq_kitinfo { ? ? ? ? uint32_t ki_maxdata; ? ? ? ? uint32_t ki_flags; ? ? ? ? uint32_t ki_rng_type; ? ? ? ? int? ? ? ki_type; ? ? ? ? int? ? ? ki_nchan; ? ? ? ? int? ? ? ki_len_chanl; }; The above structure is used in my user-space app: int main(void) { ? ? ? ? struct daq_kitinfo *info; ? ? ? ? struct daq_kit kit; ? ? ? ? int fd, size; ? ? ? ? ... ? ? ? ? ... ? ? ? ? ... ? ? ? ? /* ? ? ? ???* At this point I'll try to alloc memory. Notice that ? ? ? ???* the size i dependet from another struct entry. ? ? ? ???*/ ? ? ? ? size = sizeof(*info) * kit.k_nkits; ? ? ? ? info = malloc(size); ? ? ? ? if (info == NULL) ? ? ? ? ? ? ? ? exit(ENOMEM); ? ? ? ? /* ? ? ? ???* The next step is to call the drivers ioctl() interface ? ? ? ???* (the reason for that is described below). ? ? ? ???*/ ? ? ? ? if (ioctl(fd, DAQ_KITINFO, info)) { ? ? ? ? ? ? ? ? printf("errno: %d\n", errno); ? ? ? ? ? ? ? ???exit(errno); ? ? ? ? } ? ? ? ???printf("[ki_nchan] %d\n", info.ki_nchan); ? ? ? ???... ? ? ? ???... ? ? ? ???return (0); } and inside the driver (put it simply): static int my_ioctl(struct cdev *dev, u_long cmd, caddr_t arg, int flags, ? ? struct thread *td) { ? ? ? ? struct daq_kitinfo *info; ? ? ? ? struct daq_kit = dev->si_drv1; ? ? ? ? int size; ? ? ? ? ... ? ? ? ? /* Do something useful e.g mutex'ing... */ ? ? ? ? ... ? ? ? ? /* The same as in user-space... */ ? ? ? ? size = sizeof(*info) * kit.k_nkits; ? ? ? ? info = malloc(sz, M_DAQ, M_NOWAIT | M_ZERO); ? ? ? ? if (info == NULL) ? ? ? ? ? ? ? ? .... ? ? ? ? /* ? ? ? ???* Here I want to copy struct info from kernel to user-space. ? ? ? ???* If i use memcpy, the result is that the system hangs ? ? ? ???* and I need to reboot the machine. OK, I thought ? ? ? ???* copyout() should be able to do the job for me... ? ? ? ???*/ ? ? ? ???if (copyout(info, arg, sz)) ? ? ? ? ? ? ? ???/* ? ? ? ? ? ? ? ? ? * Fuc[k-k] i still come inside this block. I always ? ? ? ? ? ? ? ? ? * get an EFAULT error. ? ? ? ? ? ? ? ? ? */ } I really don't know what I should do to let the driver working properly. The driver should grap some informations/attributes, and fill up the info structure, so we can copy the filled info struct to the user's app. I hope somebody can help me to resolve that problem. Ah, the corresponding ioctl is: #define GRP? ? ? ? ? ? ? ? ? ? ? ? 'd' #define DAQ_KITINFO? ? ? ? _IOR(GRP, 3, struct daq_kitinfo) Thanks for attention and greatz from germany MG From des at des.no Mon Sep 28 18:57:35 2009 From: des at des.no (=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?=) Date: Mon Sep 28 18:57:43 2009 Subject: Trouble with copyout, memcpy.... Plain-Text version =) In-Reply-To: <389605.70197.qm@web28503.mail.ukl.yahoo.com> (Leunam Elebek's message of "Mon, 28 Sep 2009 10:19:08 -0700 (PDT)") References: <389605.70197.qm@web28503.mail.ukl.yahoo.com> Message-ID: <86ws3i3nj6.fsf@ds4.des.no> Leunam Elebek writes: > /* The same as in user-space... */ > size = sizeof(*info) * kit.k_nkits; > info = malloc(sz, M_DAQ, M_NOWAIT | M_ZERO); You shouldn't use M_NOWAIT unless there is absolutely no way around it. > if (info == NULL) > .... Unnecessary if you use M_WAITOK instead of M_NOWAIT. > /* > * Here I want to copy struct info from kernel to user-space. > * If i use memcpy, the result is that the system hangs > * and I need to reboot the machine. OK, I thought > * copyout() should be able to do the job for me... > */ > if (copyout(info, arg, sz)) Nope, ioctl() takes care of the copyin() / copyout(). At this point, arg is a pointer to a malloc()ed buffer of the right size (as specified by the definition of DAQ_KITINFO). > /* > * Fuc[k-k] i still come inside this block. I always > * get an EFAULT error. > */ This means that either a) info doesn't point where you think it does, b) arg doesn't point where you think it does, or c) sz doesn't have the value you think it does. In this case, it's a combination of the latter two: arg points to a kernel buffer, so the use of copyout(9) is inappropriate, but in addition, the size of that buffer is sizeof(daq_kitinfo), and you're trying to copy far more. You need to rethink your interface: either return only one struct daq_kitinfo per ioctl() call, or pass in a struct that contains a pointer to a userland buffer and a length, or use something else than ioctl(2). option 2 would be something like: struct daq_ioctl { struct daq_kitinfo *info; int nkits; }; #define GRP 'd' #define DAQ_KITINFO _IOWR(GRP, 3, struct daq_ioctl) static int my_ioctl(struct cdev *dev, u_long cmd, caddr_t arg, int flags, struct thread *td) { struct daq_ioctl *di = (struct daq_ioctl *)arg; struct daq_kitinfo *info; struct daq_kit kit; int nkits, ret; /* ... */ nkits = (kit.k_nkits > di->nkits) ? di->nkits : kit.k_nkits; info = malloc(nkits * sizeof(struct daq_kitinfo)) /* ... */ ret = copyout(info, di->info, nkits * sizeof(struct daq_kitinfo)); /* let userland know what it got */ if (ret == 0) di->nkits = nkits; return (ret); } DES -- Dag-Erling Sm?rgrav - des@des.no From tom at tomjudge.com Mon Sep 28 19:12:45 2009 From: tom at tomjudge.com (Tom Judge) Date: Mon Sep 28 19:12:52 2009 Subject: Help debugging: Fatal kernel mode data abort: 'External Linefetch Abort (P)' Message-ID: <4AC106AA.9000305@tomjudge.com> Hi, I am working on getting FreeBSD to boot on a new ARM based board, and am hitting this issue any time I load a driver for the PCI based devices on the board. My current code can be found here: http://www.tomjudge.com/tmp/em7210.patch Here is the back trace of the problem (which i can repeat with em and ohci drivers): RedBoot> load -b 0x01008000 kernel Using default protocol (TFTP) Address offset = 0x40000000 Entry point: 0x01008100, address range: 0x01008000-0x01349e28 RedBoot> go KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.0-CURRENT #12: Sat Sep 26 05:00:06 UTC 2009 root@rita.nodomain:/data/arm_build/arm/usr/src/sys/EM-7210 CPU: i80219 400MHz step A-0 (XScale core) DC enabled IC enabled WB enabled LABT branch prediction enabled 32KB/32B 32-way Instruction cache 32KB/32B 32-way write-back-locking Data cache real memory = 536870912 (512 MB) avail memory = 503738368 (480 MB) iq0: on motherboard obio0 on iq0 uart0: <16550 or compatible> on obio0 uart0: [FILTER] uart0: console (115200,n,8,1) itimer0: on iq0 iopwdog0: on iq0 pcib0: on iq0 pci0: on pcib0 No mapping for 0/5/0/ No mapping for 0/5/1/ No mapping for 0/5/2/ pci0: at device 1.0 (no driver attached) pci0: at device 2.0 (no driver attached) atapci0: mem 0x80000-0x80fff irq 27 at device 3.0 on pci0 atapci0: [ITHREAD] ata2: on atapci0 Fatal kernel mode data abort: 'External Linefetch Abort (P)' trapframe: 0xc00faaf0 FSR=00000406, FAR=Invalid, spsr=600000d3 r0 =c13e2c00, r1 =cd5bc000, r2 =00000004, r3 =c13e2d7c r4 =c13e2c00, r5 =cd5bc000, r6 =c1298290, r7 =c1388800 r8 =ffffffff, r9 =00000009, r10=c13d8c00, r11=c00fab58 r12=c00fab24, ssp=c00fab3c, slr=c102389c, pc =c1023898 [thread pid 0 tid 100000 ] Stopped at ata_ahci_chipinit+0x4d68: ldr r15, [r3, #0x024] db> bt Tracing pid 0 tid 100000 td 0xc134dca0 db_trace_thread() at db_trace_thread+0xc scp=0xc129c68c rlv=0xc100d0f0 (db_command_init+0x2a8) rsp=0xc00fa7ec rfp=0xc00fa808 r10=0x00000001 r9=0xc13537f4 r8=0xc134b0c4 r7=0x00000062 r6=0x00000002 r5=0x00000010 r4=0xc134dca0 db_command_init() at db_command_init+0x1d0 scp=0xc100d018 rlv=0xc100cba0 (db_skip_to_eol+0x49c) rsp=0xc00fa80c rfp=0xc00fa8b0 r5=0x00000000 r4=0xc1327938 db_skip_to_eol() at db_skip_to_eol+0x1d0 scp=0xc100c8d4 rlv=0xc100cd0c (db_command_loop+0x60) rsp=0xc00fa8b4 rfp=0xc00fa8c0 r10=0x00000000 r8=0x00000406 r7=0xc00faaf0 r6=0xc13537f0 r5=0x600000d3 r4=0xc00fa8cc db_command_loop() at db_command_loop+0xc scp=0xc100ccb8 rlv=0xc100f050 (X_db_sym_numargs+0xf4) rsp=0xc00fa8c4 rfp=0xc00fa9e0 X_db_sym_numargs() at X_db_sym_numargs+0x14 scp=0xc100ef70 rlv=0xc1106e40 (kdb_trap+0xa4) rsp=0xc00fa9e4 rfp=0xc00faa0c r4=0x000000c0 kdb_trap() at kdb_trap+0xc scp=0xc1106da8 rlv=0xc12acb44 (badaddr_read+0x280) rsp=0xc00faa10 rfp=0xc00faa2c r10=0x00000000 r9=0x00000009 r8=0xc00faaf0 r7=0x00000406 r6=0x00000000 r5=0x00000406 r4=0xc00faaf0 badaddr_read() at badaddr_read+0xfc scp=0xc12ac9c0 rlv=0xc12acfdc (prefetch_abort_handler+0x440) rsp=0xc00faa30 rfp=0xc00faa50 r6=0xc134dca0 r5=0xc00faef8 r4=0xc00faaf0 prefetch_abort_handler() at prefetch_abort_handler+0x378 scp=0xc12acf14 rlv=0xc12ad1a8 (data_abort_handler+0x110) rsp=0xc00faa54 rfp=0xc00faaec r7=0xc134dca0 r6=0xc1298290 r5=0xc00faef8 r4=0xc134d9d8 data_abort_handler() at data_abort_handler+0xc scp=0xc12ad0a4 rlv=0xc129e0c8 (address_exception_entry+0x50) rsp=0xc00faaf0 rfp=0xc00fab58 r10=0xc13d8c00 r9=0x00000009 r8=0xffffffff r7=0xc1388800 r6=0xc1298290 r5=0xffff1004 r4=0xc13e2c00 ata_ahci_chipinit() at ata_ahci_chipinit+0x4c44 scp=0xc1023774 rlv=0xc101b664 (ata_mode2idx+0x464) rsp=0xc00fab5c rfp=0xc00fab78 r7=0xc1391900 r6=0xc13d8c00 r5=0xc1388800 r4=0xc13a01b0 ata_mode2idx() at ata_mode2idx+0x3ec scp=0xc101b5ec rlv=0xc1101a98 (device_attach+0x2c8) rsp=0xc00fab7c rfp=0xc00fabb8 r7=0xc1100140 r6=0xc13d8c4c r5=0x80000000 r4=0xc1383080 device_attach() at device_attach+0xc scp=0xc11017dc rlv=0xc1102e5c (device_probe_and_attach+0x34) rsp=0xc00fabbc rfp=0xc00fabcc r10=0xc1383080 r8=0xffffffff r7=0xc1100140 r6=0xc1383080 r5=0xc1391900 r4=0xc13d8c00 device_probe_and_attach() at device_probe_and_attach+0xc scp=0xc1102e34 rlv=0xc1102e80 (bus_generic_attach+0x20) rsp=0xc00fabd0 rfp=0xc00fabe0 r4=0xc13d8c00 bus_generic_attach() at bus_generic_attach+0xc scp=0xc1102e6c rlv=0xc101d11c (ata_pci_attach+0x2a4) rsp=0xc00fabe4 rfp=0xc00fac0c r4=0x00000004 ata_pci_attach() at ata_pci_attach+0xc scp=0xc101ce84 rlv=0xc1101a98 (device_attach+0x2c8) rsp=0xc00fac10 rfp=0xc00fac4c r7=0xc1100140 r6=0xc13830cc r5=0x80000000 r4=0xc1383200 device_attach() at device_attach+0xc scp=0xc11017dc rlv=0xc1102e5c (device_probe_and_attach+0x34) rsp=0xc00fac50 rfp=0xc00fac60 r10=0xc1383200 r8=0xffffffff r7=0xc1100140 r6=0x00000000 r5=0x00000000 r4=0xc1383080 device_probe_and_attach() at device_probe_and_attach+0xc scp=0xc1102e34 rlv=0xc1102e80 (bus_generic_attach+0x20) rsp=0xc00fac64 rfp=0xc00fac74 r4=0xc1383080 bus_generic_attach() at bus_generic_attach+0xc scp=0xc1102e6c rlv=0xc1067094 (pci_add_children+0x240) rsp=0xc00fac78 rfp=0xc00fac98 r4=0xc1383200 pci_add_children() at pci_add_children+0x154 scp=0xc1066fa8 rlv=0xc1101a98 (device_attach+0x2c8) rsp=0xc00fac9c rfp=0xc00facd8 r6=0xc138324c r5=0x80000000 r4=0xc13d9300 device_attach() at device_attach+0xc scp=0xc11017dc rlv=0xc1102e5c (device_probe_and_attach+0x34) rsp=0xc00facdc rfp=0xc00facec r10=0xc13d9300 r8=0xffffffff r7=0x00000000 r6=0xc13d9300 r5=0xc13bec18 r4=0xc1383200 device_probe_and_attach() at device_probe_and_attach+0xc scp=0xc1102e34 rlv=0xc1102e80 (bus_generic_attach+0x20) rsp=0xc00facf0 rfp=0xc00fad00 r4=0xc1383200 bus_generic_attach() at bus_generic_attach+0xc scp=0xc1102e6c rlv=0xc12b25a4 (i80321_sdram_bounds+0x860) rsp=0xc00fad04 rfp=0xc00fad20 r4=0xc13bec60 i80321_sdram_bounds() at i80321_sdram_bounds+0x6f4 scp=0xc12b2438 rlv=0xc1101a98 (device_attach+0x2c8) rsp=0xc00fad24 rfp=0xc00fad60 r7=0xc1100140 r6=0xc13d934c r5=0x80000000 r4=0xc13d9580 device_attach() at device_attach+0xc scp=0xc11017dc rlv=0xc1102e5c (device_probe_and_attach+0x34) rsp=0xc00fad64 rfp=0xc00fad74 r10=0xc13d9580 r8=0xc13d9580 r7=0x40000004 r6=0xc13e2c00 r5=0x00000000 r4=0xc13d9300 device_probe_and_attach() at device_probe_and_attach+0xc scp=0xc1102e34 rlv=0xc1102e80 (bus_generic_attach+0x20) rsp=0xc00fad78 rfp=0xc00fad88 r4=0xc13d9300 bus_generic_attach() at bus_generic_attach+0xc scp=0xc1102e6c rlv=0xc12b31e0 (iq80321_attach+0x370) rsp=0xc00fad8c rfp=0xc00fadb8 r4=0xc12eeca8 iq80321_attach() at iq80321_attach+0xc scp=0xc12b2e7c rlv=0xc1101a98 (device_attach+0x2c8) rsp=0xc00fadbc rfp=0xc00fadf8 r8=0xffffffff r7=0xc1100140 r6=0xc13d95cc r5=0x80000000 r4=0xc13d9680 device_attach() at device_attach+0xc scp=0xc11017dc rlv=0xc1102e5c (device_probe_and_attach+0x34) rsp=0xc00fadfc rfp=0xc00fae0c r10=0xc13d9680 r8=0xffffffff r7=0xc1100140 r6=0xc13d96cc r5=0x80000000 r4=0xc13d9580 device_probe_and_attach() at device_probe_and_attach+0xc scp=0xc1102e34 rlv=0xc1102e80 (bus_generic_attach+0x20) rsp=0xc00fae10 rfp=0xc00fae20 r4=0xc13d9580 bus_generic_attach() at bus_generic_attach+0xc scp=0xc1102e6c rlv=0xc12a0fec (minidumpsys+0xaf4) rsp=0xc00fae24 rfp=0xc00fae34 r4=0xc13d9680 minidumpsys() at minidumpsys+0xae4 scp=0xc12a0fdc rlv=0xc1101a98 (device_attach+0x2c8) rsp=0xc00fae38 rfp=0xc00fae74 r4=0xc12c30b8 device_attach() at device_attach+0xc scp=0xc11017dc rlv=0xc1102e5c (device_probe_and_attach+0x34) rsp=0xc00fae78 rfp=0xc00fae88 r10=0x0000000a r8=0x00000000 r7=0xa10081a4 r6=0xc13d9b80 r5=0xc1347e7c r4=0xc13d9680 device_probe_and_attach() at device_probe_and_attach+0xc scp=0xc1102e34 rlv=0xc1103140 (bus_generic_new_pass+0xe4) rsp=0xc00fae8c rfp=0xc00faea4 r4=0xc13d9680 bus_generic_new_pass() at bus_generic_new_pass+0xc scp=0xc1103068 rlv=0xc10ff13c (bus_set_pass+0x98) rsp=0xc00faea8 rfp=0xc00faec0 r6=0x7fffffff r5=0xc13d9b80 r4=0xc13f16c0 bus_set_pass() at bus_set_pass+0xc scp=0xc10ff0b0 rlv=0xc10ff184 (root_bus_configure+0x14) rsp=0xc00faec4 rfp=0xc00faed0 r6=0x00000006 r5=0xa10081b0 r4=0xc12f0cb4 root_bus_configure() at root_bus_configure+0xc scp=0xc10ff17c rlv=0xc129708c (xdr_sizeof+0x1d0) rsp=0xc00faed4 rfp=0xc00faee0 xdr_sizeof() at xdr_sizeof+0x1cc scp=0xc1297088 rlv=0xc108e39c (mi_startup+0xdc) rsp=0xc00faee4 rfp=0xc00faef4 mi_startup() at mi_startup+0xc scp=0xc108e2cc rlv=0xc1008248 (btext+0x148) rsp=0xc00faef8 rfp=0x00000000 r4=0xa1008288 Thanks Tom From mlfbsd at ci0.org Mon Sep 28 20:48:35 2009 From: mlfbsd at ci0.org (Olivier Houchard) Date: Mon Sep 28 20:51:20 2009 Subject: Help debugging: Fatal kernel mode data abort: 'External Linefetch Abort (P)' In-Reply-To: <4AC106AA.9000305@tomjudge.com> References: <4AC106AA.9000305@tomjudge.com> Message-ID: <20090928202132.GA15236@ci0.org> On Mon, Sep 28, 2009 at 06:55:38PM +0000, Tom Judge wrote: > Hi, > > I am working on getting FreeBSD to boot on a new ARM based board, and am > hitting this issue any time I load a driver for the PCI based devices on > the board. > > My current code can be found here: > > http://www.tomjudge.com/tmp/em7210.patch > Hi Tom, My guess is, you should include std.i80219 instead of std.i80321 in std.em7210. If you do not, CPU_XSCALE_80219 won't be defined, and the 80321 code to check if the board is host or not will be used, and will wrongly assume it is not, and thus won't map the PCI mem correctly. Regards, Olivier From tom at tomjudge.com Tue Sep 29 02:05:59 2009 From: tom at tomjudge.com (Tom Judge) Date: Tue Sep 29 02:06:06 2009 Subject: Help debugging: Fatal kernel mode data abort: 'External Linefetch Abort (P)' In-Reply-To: <20090928202132.GA15236@ci0.org> References: <4AC106AA.9000305@tomjudge.com> <20090928202132.GA15236@ci0.org> Message-ID: <4AC16B5A.8090407@tomjudge.com> Olivier Houchard wrote: > On Mon, Sep 28, 2009 at 06:55:38PM +0000, Tom Judge wrote: > >> Hi, >> >> I am working on getting FreeBSD to boot on a new ARM based board, and am >> hitting this issue any time I load a driver for the PCI based devices on >> the board. >> >> My current code can be found here: >> >> http://www.tomjudge.com/tmp/em7210.patch >> >> > > Hi Tom, > > My guess is, you should include std.i80219 instead of std.i80321 in std.em7210. > If you do not, CPU_XSCALE_80219 won't be defined, and the 80321 code to > check if the board is host or not will be used, and will wrongly assume > it is not, and thus won't map the PCI mem correctly. > > Hi Olivier, I have switched out the std file and am now using std.i80219 but am still having issues. I think the problems are the pci memory mappings in the controller devices. On linux em0 gets mapped as follows: cd 0000\:00\:01.0/ # ls class device local_cpus subsystem_device config driver resource subsystem_vendor detach_state irq rom vendor # cat resource 0x0000000080000000 0x000000008001ffff 0x0000000000000200 0x0000000080020000 0x000000008003ffff 0x0000000000000200 0x00000000fe000000 0x00000000fe00003f 0x0000000000000101 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000080040000 0x000000008005ffff 0x0000000000007200 # Where as on FreeBSD I am seeing this: em0: port 0xfe400000-0xfe40003f mem 0-0x1ffff,0x20000-0x3ffff irq 29 at device 1.0 on pci0 Seems that I am missing the 0x800 off the front of the PCI memory mappings. I have confirmed this with the ata driver also and see the same issues. Where should I be looking to fix this? Thanks Tom From tom at tomjudge.com Tue Sep 29 02:46:13 2009 From: tom at tomjudge.com (Tom Judge) Date: Tue Sep 29 02:46:19 2009 Subject: Help debugging: Fatal kernel mode data abort: 'External Linefetch Abort (P)' In-Reply-To: <4AC16B5A.8090407@tomjudge.com> References: <4AC106AA.9000305@tomjudge.com> <20090928202132.GA15236@ci0.org> <4AC16B5A.8090407@tomjudge.com> Message-ID: <4AC174D1.8080900@tomjudge.com> Tom Judge wrote: > Olivier Houchard wrote: >> On Mon, Sep 28, 2009 at 06:55:38PM +0000, Tom Judge wrote: >> >>> Hi, >>> >>> I am working on getting FreeBSD to boot on a new ARM based board, >>> and am hitting this issue any time I load a driver for the PCI based >>> devices on the board. >>> >>> My current code can be found here: >>> >>> http://www.tomjudge.com/tmp/em7210.patch >>> >>> >> >> Hi Tom, >> >> My guess is, you should include std.i80219 instead of std.i80321 in >> std.em7210. >> If you do not, CPU_XSCALE_80219 won't be defined, and the 80321 code to >> check if the board is host or not will be used, and will wrongly >> assume it is not, and thus won't map the PCI mem correctly. >> >> > Hi Olivier, > > I have switched out the std file and am now using std.i80219 but am > still having issues. > > I think the problems are the pci memory mappings in the controller > devices. > > On linux em0 gets mapped as follows: > > cd 0000\:00\:01.0/ > # ls > class device local_cpus subsystem_device > config driver resource subsystem_vendor > detach_state irq rom vendor > # cat resource > 0x0000000080000000 0x000000008001ffff 0x0000000000000200 > 0x0000000080020000 0x000000008003ffff 0x0000000000000200 > 0x00000000fe000000 0x00000000fe00003f 0x0000000000000101 > 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x0000000080040000 0x000000008005ffff 0x0000000000007200 > # > > > > Where as on FreeBSD I am seeing this: > em0: port > 0xfe400000-0xfe40003f mem 0-0x1ffff,0x20000-0x3ffff irq 29 at device > 1.0 on pci0 > > Seems that I am missing the 0x800 off the front of the PCI memory > mappings. > > I have confirmed this with the ata driver also and see the same issues. > > Where should I be looking to fix this? > Forgot to include the output from VERBOSE_INIT_ARM iq0: on motherboard i80321: BAR0 = 20000004.00000000 BAR1 = 40000004.00000000 i80219: BAR0 = 20000000.00000000 BAR1 = 40000000.00000000 i80321: SBDR = 0xa0000000 SBR0 = 0x00000018 SBR1 = 0x00000020 i80321: BANK0 = 0x10000000 BANK1 = 0x10000000 i80321: Reserve space for private devices (Inbound Window 1) hi:0x00000000 lo:0x8000000c xlate:0x80000000 size:0x04000000 i80321: RAM access (Inbound Window 2) hi:0x00000000 lo:0xa000000c xlate:0xa0000000 size:0x20000000 obio0 on iq0 uart0: <16550 or compatible> on obio0 From mlfbsd at kanar.ci0.org Tue Sep 29 09:38:01 2009 From: mlfbsd at kanar.ci0.org (Olivier Houchard) Date: Tue Sep 29 11:36:54 2009 Subject: Help debugging: Fatal kernel mode data abort: 'External Linefetch Abort (P)' In-Reply-To: <4AC16B5A.8090407@tomjudge.com> References: <4AC106AA.9000305@tomjudge.com> <20090928202132.GA15236@ci0.org> <4AC16B5A.8090407@tomjudge.com> Message-ID: <20090929093825.GA26424@ci0.org> On Tue, Sep 29, 2009 at 02:05:14AM +0000, Tom Judge wrote: > Olivier Houchard wrote: > >On Mon, Sep 28, 2009 at 06:55:38PM +0000, Tom Judge wrote: > > > >>Hi, > >> > >>I am working on getting FreeBSD to boot on a new ARM based board, and am > >>hitting this issue any time I load a driver for the PCI based devices on > >>the board. > >> > >>My current code can be found here: > >> > >>http://www.tomjudge.com/tmp/em7210.patch > >> > >> > > > >Hi Tom, > > > >My guess is, you should include std.i80219 instead of std.i80321 in > >std.em7210. > >If you do not, CPU_XSCALE_80219 won't be defined, and the 80321 code to > >check if the board is host or not will be used, and will wrongly assume > >it is not, and thus won't map the PCI mem correctly. > > > > > Hi Olivier, > > I have switched out the std file and am now using std.i80219 but am > still having issues. > > I think the problems are the pci memory mappings in the controller devices. > > On linux em0 gets mapped as follows: > > cd 0000\:00\:01.0/ > # ls > class device local_cpus subsystem_device > config driver resource subsystem_vendor > detach_state irq rom vendor > # cat resource > 0x0000000080000000 0x000000008001ffff 0x0000000000000200 > 0x0000000080020000 0x000000008003ffff 0x0000000000000200 > 0x00000000fe000000 0x00000000fe00003f 0x0000000000000101 > 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x0000000080040000 0x000000008005ffff 0x0000000000007200 > # > > > > Where as on FreeBSD I am seeing this: > em0: port > 0xfe400000-0xfe40003f mem 0-0x1ffff,0x20000-0x3ffff irq 29 at device 1.0 > on pci0 > > Seems that I am missing the 0x800 off the front of the PCI memory mappings. > Ok I'm a bit confused about this code, it's been too long since I haven't read it :) Could you try the attached patch ? Thanks ! If it doesn't help, you can print adapter->osdep.mem_bus_space_handle in if_em.c to make sure it is the same as in linux. Regards, Olivier -------------- next part -------------- Index: arm/xscale/i80321/i80321_pci.c =================================================================== --- arm/xscale/i80321/i80321_pci.c (revision 196158) +++ arm/xscale/i80321/i80321_pci.c (working copy) @@ -92,8 +92,7 @@ sc->sc_busno = busno; sc->sc_pciio = &i80321_softc->sc_pci_iot; sc->sc_pcimem = &i80321_softc->sc_pci_memt; - sc->sc_mem = i80321_softc->sc_owin[0].owin_xlate_lo + - VERDE_OUT_XLATE_MEM_WIN_SIZE; + sc->sc_mem = i80321_softc->sc_owin[0].owin_xlate_lo; sc->sc_io = i80321_softc->sc_iow_vaddr; /* Initialize memory and i/o rmans. */ From tom at tomjudge.com Tue Sep 29 14:53:29 2009 From: tom at tomjudge.com (Tom Judge) Date: Tue Sep 29 14:53:38 2009 Subject: Help debugging: Fatal kernel mode data abort: 'External Linefetch Abort (P)' In-Reply-To: <20090929093825.GA26424@ci0.org> References: <4AC106AA.9000305@tomjudge.com> <20090928202132.GA15236@ci0.org> <4AC16B5A.8090407@tomjudge.com> <20090929093825.GA26424@ci0.org> Message-ID: <4AC21F44.6060004@tomjudge.com> Olivier Houchard wrote: > On Tue, Sep 29, 2009 at 02:05:14AM +0000, Tom Judge wrote: > >> Hi Olivier, >> >> I have switched out the std file and am now using std.i80219 but am >> still having issues. >> >> I think the problems are the pci memory mappings in the controller devices. >> >> On linux em0 gets mapped as follows: >> >> cd 0000\:00\:01.0/ >> # ls >> class device local_cpus subsystem_device >> config driver resource subsystem_vendor >> detach_state irq rom vendor >> # cat resource >> 0x0000000080000000 0x000000008001ffff 0x0000000000000200 >> 0x0000000080020000 0x000000008003ffff 0x0000000000000200 >> 0x00000000fe000000 0x00000000fe00003f 0x0000000000000101 >> 0x0000000000000000 0x0000000000000000 0x0000000000000000 >> 0x0000000000000000 0x0000000000000000 0x0000000000000000 >> 0x0000000000000000 0x0000000000000000 0x0000000000000000 >> 0x0000000080040000 0x000000008005ffff 0x0000000000007200 >> # >> >> >> >> Where as on FreeBSD I am seeing this: >> em0: port >> 0xfe400000-0xfe40003f mem 0-0x1ffff,0x20000-0x3ffff irq 29 at device 1.0 >> on pci0 >> >> Seems that I am missing the 0x800 off the front of the PCI memory mappings. >> >> > > Ok I'm a bit confused about this code, it's been too long since I haven't > read it :) > Could you try the attached patch ? > > Thanks ! > > If it doesn't help, you can print adapter->osdep.mem_bus_space_handle in > if_em.c to make sure it is the same as in linux. > > Hi Olivier, I have tried the patch and here are the boot results: i80321: BAR0 = 20000004.00000000 BAR1 = 40000004.00000000 i80219: BAR0 = 20000000.00000000 BAR1 = 40000000.00000000 i80219: I/O Processor, acting as PCI host i80321: SBDR = 0xa0000000 SBR0 = 0x00000018 SBR1 = 0x00000020 i80321: BANK0 = 0x10000000 BANK1 = 0x10000000 i80321: Reserve space for private devices (Inbound Window 1) hi:0x00000000 lo:0x8000000c xlate:0x80000000 size:0x04000000 i80321: RAM access (Inbound Window 2) hi:0x00000000 lo:0xa000000c xlate:0xa0000000 size:0x20000000 obio0 on iq0 uart0: <16550 or compatible> on obio0 uart0: [FILTER] uart0: console (115200,n,8,1) itimer0: on iq0 iopwdog0: on iq0 pcib0: on iq0 pci0: on pcib0 Device 1 routed to irq 27 Device 2 routed to irq 30 Device 3 routed to irq 29 Device 5 routed to irq 30 Device 5 routed to irq 29 Device 5 routed to irq 27 em0: port 0xfe400000-0xfe40003f mem 0-0x1ffff,0x20000-0x3ffff irq 27 at device 1.0 on pci0 em0: Start: 0x00000000 em0: End: 0x0001FFFF em0: Size: 0x00020000 Fatal kernel mode data abort: 'External Linefetch Abort (P)' trapframe: 0xc00faad0 FSR=00000406, FAR=Invalid, spsr=200000d3 r0 =c00d0400, r1 =cd5bf000, r2 =00000010, r3 =0000000a r4 =c317e008, r5 =cd5bf000, r6 =c00d0400, r7 =c130212c r8 =c317e008, r9 =c0071180, r10=c317e000, r11=c00fab40 r12=c00fab44, ssp=c00fab1c, slr=c106a96c, pc =c106a968 [thread pid 0 tid 100000 ] Stopped at e1000_init_script_state_82541+0x24c: blx r7 db> As you can see I added some debug to if_em.c as such: Index: sys/dev/e1000/if_em.c =================================================================== --- sys/dev/e1000/if_em.c (revision 197472) +++ sys/dev/e1000/if_em.c (working copy) @@ -2770,6 +2770,9 @@ rman_get_bustag(adapter->memory); adapter->osdep.mem_bus_space_handle = rman_get_bushandle(adapter->memory); + device_printf(dev,"Start: 0x%08lX\n", rman_get_start(adapter->memory)); + device_printf(dev,"End: 0x%08lX\n", rman_get_end(adapter->memory)); + device_printf(dev,"Size: 0x%08lX\n", rman_get_size(adapter->memory)); adapter->hw.hw_addr = (u8 *)&adapter->osdep.mem_bus_space_handle; /* Only older adapters use IO mapping */ But the memory mapping seems to be missing the most significant 0x8. Thanks Tom From attilio at freebsd.org Tue Sep 29 16:09:04 2009 From: attilio at freebsd.org (Attilio Rao) Date: Tue Sep 29 16:09:13 2009 Subject: sx locks and memory barriers In-Reply-To: <20090924224935.GW473@gandalf.sssup.it> References: <20090924224935.GW473@gandalf.sssup.it> Message-ID: <3bbf2fe10909290839w305c85c3t1532bd7733c39a6a@mail.gmail.com> 2009/9/25 Fabio Checconi : > Hi all, > looking at sys/sx.h I have some troubles understanding this comment: > > * A note about memory barriers. Exclusive locks need to use the same > * memory barriers as mutexes: _acq when acquiring an exclusive lock > * and _rel when releasing an exclusive lock. On the other side, > * shared lock needs to use an _acq barrier when acquiring the lock > * but, since they don't update any locked data, no memory barrier is > * needed when releasing a shared lock. > > In particular, I'm not understanding what prevents the following sequence > from happening: > > CPU A CPU B > > sx_slock(&data->lock); > > sx_sunlock(&data->lock); > > /* reordered after the unlock > by the cpu */ > if (data->buffer) > sx_xlock(&data->lock); > free(data->buffer); > data->buffer = NULL; > sx_xunlock(&data->lock); > > a = *data->buffer; > > IOW, even if readers do not modify the data protected by the lock, > without a release barrier a memory access may leak past the unlock (as > the cpu won't notice any dependency between the unlock and the fetch, > feeling free to reorder them), thus potentially racing with an exclusive > writer accessing the data. > > On architectures where atomic ops serialize memory accesses this would > never happen, otherwise the sequence above seems possible; am I missing > something? I think your concerns are right, possibly we need this patch: http://www.freebsd.org/~attilio/sxrw_unlockb.diff However speaking with John we agreed possibly there is a more serious breakage. Possibly, memory barriers would also require to ensure the compiler to not reorder the operation, while right now, in FreeBSD, they just take care of the reordering from the architecture perspective. The only way I'm aware of GCC offers that is to clobber memory. I will provide a patch that address this soon, hoping that GCC will be smart enough to not overhead too much the memory clobbering but just try to understand what's our purpose and servers it (I will try to compare code generated before and after the patch at least for tier-1 architectures). Attilio -- Peace can only be achieved by understanding - A. Einstein From max at love2party.net Tue Sep 29 17:53:39 2009 From: max at love2party.net (Max Laier) Date: Tue Sep 29 17:53:46 2009 Subject: sx locks and memory barriers In-Reply-To: <3bbf2fe10909290839w305c85c3t1532bd7733c39a6a@mail.gmail.com> References: <20090924224935.GW473@gandalf.sssup.it> <3bbf2fe10909290839w305c85c3t1532bd7733c39a6a@mail.gmail.com> Message-ID: <200909291953.36373.max@love2party.net> On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: > 2009/9/25 Fabio Checconi : > > Hi all, > > looking at sys/sx.h I have some troubles understanding this comment: > > > > * A note about memory barriers. Exclusive locks need to use the same > > * memory barriers as mutexes: _acq when acquiring an exclusive lock > > * and _rel when releasing an exclusive lock. On the other side, > > * shared lock needs to use an _acq barrier when acquiring the lock > > * but, since they don't update any locked data, no memory barrier is > > * needed when releasing a shared lock. > > > > In particular, I'm not understanding what prevents the following sequence > > from happening: > > > > CPU A CPU B > > > > sx_slock(&data->lock); > > > > sx_sunlock(&data->lock); > > > > /* reordered after the unlock > > by the cpu */ > > if (data->buffer) > > sx_xlock(&data->lock); > > free(data->buffer); > > data->buffer = NULL; > > sx_xunlock(&data->lock); > > > > a = *data->buffer; > > > > IOW, even if readers do not modify the data protected by the lock, > > without a release barrier a memory access may leak past the unlock (as > > the cpu won't notice any dependency between the unlock and the fetch, > > feeling free to reorder them), thus potentially racing with an exclusive > > writer accessing the data. > > > > On architectures where atomic ops serialize memory accesses this would > > never happen, otherwise the sequence above seems possible; am I missing > > something? > > I think your concerns are right, possibly we need this patch: > http://www.freebsd.org/~attilio/sxrw_unlockb.diff > > However speaking with John we agreed possibly there is a more serious > breakage. Possibly, memory barriers would also require to ensure the > compiler to not reorder the operation, while right now, in FreeBSD, they > just take care of the reordering from the architecture perspective. > The only way I'm aware of GCC offers that is to clobber memory. > I will provide a patch that address this soon, hoping that GCC will be > smart enough to not overhead too much the memory clobbering but just > try to understand what's our purpose and servers it (I will try to > compare code generated before and after the patch at least for tier-1 > architectures). Does GCC really reorder accesses to volatile objects? The C Standard seems to object: 5.1.2.3 - 2 Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects,11) which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.) I might be reading this wrong, of course. -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News From jhb at freebsd.org Tue Sep 29 18:30:07 2009 From: jhb at freebsd.org (John Baldwin) Date: Tue Sep 29 18:30:13 2009 Subject: sx locks and memory barriers In-Reply-To: <3bbf2fe10909290839w305c85c3t1532bd7733c39a6a@mail.gmail.com> References: <20090924224935.GW473@gandalf.sssup.it> <3bbf2fe10909290839w305c85c3t1532bd7733c39a6a@mail.gmail.com> Message-ID: <200909291425.46134.jhb@freebsd.org> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: > 2009/9/25 Fabio Checconi : > > Hi all, > > looking at sys/sx.h I have some troubles understanding this comment: > > > > * A note about memory barriers. Exclusive locks need to use the same > > * memory barriers as mutexes: _acq when acquiring an exclusive lock > > * and _rel when releasing an exclusive lock. On the other side, > > * shared lock needs to use an _acq barrier when acquiring the lock > > * but, since they don't update any locked data, no memory barrier is > > * needed when releasing a shared lock. > > > > In particular, I'm not understanding what prevents the following sequence > > from happening: > > > > CPU A CPU B > > > > sx_slock(&data->lock); > > > > sx_sunlock(&data->lock); > > > > /* reordered after the unlock > > by the cpu */ > > if (data->buffer) > > sx_xlock(&data->lock); > > free(data->buffer); > > data->buffer = NULL; > > sx_xunlock(&data->lock); > > > > a = *data->buffer; > > > > IOW, even if readers do not modify the data protected by the lock, > > without a release barrier a memory access may leak past the unlock (as > > the cpu won't notice any dependency between the unlock and the fetch, > > feeling free to reorder them), thus potentially racing with an exclusive > > writer accessing the data. > > > > On architectures where atomic ops serialize memory accesses this would > > never happen, otherwise the sequence above seems possible; am I missing > > something? > > I think your concerns are right, possibly we need this patch: > http://www.freebsd.org/~attilio/sxrw_unlockb.diff Actually, since you are only worried about reads, I think this should be an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we should prefer the cheapest barrier that provides what we need. You would still need to keep some language about the memory barriers since using "acq" for shared unlocking is different from exclusive unlocking. I can't recall why I thought this was ok originally, sadly my p4 logs didn't include the reasoning either. :-/ > However speaking with John we agreed possibly there is a more serious breakage. > Possibly, memory barriers would also require to ensure the compiler to > not reorder the operation, while right now, in FreeBSD, they just take > care of the reordering from the architecture perspective. > The only way I'm aware of GCC offers that is to clobber memory. > I will provide a patch that address this soon, hoping that GCC will be > smart enough to not overhead too much the memory clobbering but just > try to understand what's our purpose and servers it (I will try to > compare code generated before and after the patch at least for tier-1 > architectures). The memory clobber is quite heavyweight. It actually forces gcc to forget any cached memory items in registers and reload everything again. What I really want is just a barrier to tell GCC to not reorder things. If I read a value in the program before acquiring a lock it is in theory fine to keep that cached across the barrier. However, there isn't a way to do this sort of thing with GCC currently. -- John Baldwin From attilio at freebsd.org Tue Sep 29 19:15:42 2009 From: attilio at freebsd.org (Attilio Rao) Date: Tue Sep 29 19:15:49 2009 Subject: sx locks and memory barriers In-Reply-To: <200909291425.46134.jhb@freebsd.org> References: <20090924224935.GW473@gandalf.sssup.it> <3bbf2fe10909290839w305c85c3t1532bd7733c39a6a@mail.gmail.com> <200909291425.46134.jhb@freebsd.org> Message-ID: <3bbf2fe10909291215i2bdd73aj13c1ac433152cab4@mail.gmail.com> 2009/9/29 John Baldwin : > On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >> 2009/9/25 Fabio Checconi : >> > Hi all, >> > looking at sys/sx.h I have some troubles understanding this comment: >> > >> > * A note about memory barriers. Exclusive locks need to use the same >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> > * and _rel when releasing an exclusive lock. On the other side, >> > * shared lock needs to use an _acq barrier when acquiring the lock >> > * but, since they don't update any locked data, no memory barrier is >> > * needed when releasing a shared lock. >> > >> > In particular, I'm not understanding what prevents the following sequence >> > from happening: >> > >> > CPU A CPU B >> > >> > sx_slock(&data->lock); >> > >> > sx_sunlock(&data->lock); >> > >> > /* reordered after the unlock >> > by the cpu */ >> > if (data->buffer) >> > sx_xlock(&data->lock); >> > free(data->buffer); >> > data->buffer = NULL; >> > sx_xunlock(&data->lock); >> > >> > a = *data->buffer; >> > >> > IOW, even if readers do not modify the data protected by the lock, >> > without a release barrier a memory access may leak past the unlock (as >> > the cpu won't notice any dependency between the unlock and the fetch, >> > feeling free to reorder them), thus potentially racing with an exclusive >> > writer accessing the data. >> > >> > On architectures where atomic ops serialize memory accesses this would >> > never happen, otherwise the sequence above seems possible; am I missing >> > something? >> >> I think your concerns are right, possibly we need this patch: >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > > Actually, since you are only worried about reads, I think this should be > an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we > should prefer the cheapest barrier that provides what we need. You would > still need to keep some language about the memory barriers since using "acq" > for shared unlocking is different from exclusive unlocking. Actually, I don't think that an acq barrier ensures enough protection against the reordering of 'earlier' operation thus not fixing the architecture ordering problem reported by Fabio. Also, I don't think we just have to care about reads (or I don't understand what you mean here). However, I'm not even sure that we have faster read barriers than the write one. As long as it should be true in theory I don't think that's what happen in practice. > The memory clobber is quite heavyweight. It actually forces gcc to forget any > cached memory items in registers and reload everything again. What I really > want is just a barrier to tell GCC to not reorder things. If I read a value > in the program before acquiring a lock it is in theory fine to keep that > cached across the barrier. However, there isn't a way to do this sort of > thing with GCC currently. Yes, that's the only tool we have right now with GCC. I will try to look for another way, but it sounds difficult to discover. Attilio -- Peace can only be achieved by understanding - A. Einstein From tom at tomjudge.com Tue Sep 29 19:39:33 2009 From: tom at tomjudge.com (Tom Judge) Date: Tue Sep 29 19:39:41 2009 Subject: Help debugging: Fatal kernel mode data abort: 'External Linefetch Abort (P)' In-Reply-To: <200909291908.n8TJ8UA2042239@casselton.net> References: <200909291908.n8TJ8UA2042239@casselton.net> Message-ID: <4AC2624D.30200@tomjudge.com> Mark Tinguely wrote: > I don't know anything about the code other than what I read today ... > > It appears from you boot traces the owin[0].owin_xlate_[lo | hi] values > should be fine in iq80321.c - an "VERBOSE_INIT_ARM" would confirm it. > > You might want to test if the "sc" pointer in iq80321.c has the same value > as the global "i80321_softc" pointer. You can add those print statements > to an "VERBOSE_INIT_ARM". It will tell you if something changed the global > pointer or if something overwrote the owin values in the structure. > > If global pointer or owin was changed before the pci attach code, you > can put the appropriate test into the earlier (obio, uart, itimer, iopwdtimer) > attach. None of these attaches use the global "i80321_softc" pointer. > > --Mark. > Hi Mark, Here is the log of a verbose_init_arm with added debug for owin[0].owin_xlate_[lo|hi]: RedBoot> go KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.0-CURRENT #30: Tue Sep 29 19:02:49 UTC 2009 root@rita.nodomain:/data/arm_build/arm/usr/src/sys/EM-7210 CPU: i80219 400MHz step A-0 (XScale core) DC enabled IC enabled WB enabled LABT branch prediction enabled 32KB/32B 32-way Instruction cache 32KB/32B 32-way write-back-locking Data cache real memory = 536870912 (512 MB) avail memory = 503738368 (480 MB) iq0: on motherboard i80321: BAR0 = 20000004.00000000 BAR1 = 40000004.00000000 i80219: BAR0 = 20000000.00000000 BAR1 = 40000000.00000000 i80219: I/O Processor, acting as PCI host i80321: SBDR = 0xa0000000 SBR0 = 0x00000018 SBR1 = 0x00000020 i80321: BANK0 = 0x10000000 BANK1 = 0x10000000 i80321: Reserve space for private devices (Inbound Window 1) hi:0x00000000 lo:0x8000000c xlate:0x80000000 size:0x04000000 i80321: RAM access (Inbound Window 2) hi:0x00000000 lo:0xa000000c xlate:0xa0000000 size:0x20000000 i80321: Reserve space for private devices (Outbound Window 1) xlate_hi:0x00000000 xlate_lo:0x80000000 obio0 on iq0 uart0: <16550 or compatible> on obio0 uart0: [FILTER] uart0: console (115200,n,8,1) itimer0: on iq0 iopwdog0: on iq0 pcib0: on iq0 pci0: on pcib0 Device 1 routed to irq 27 Device 2 routed to irq 30 Device 3 routed to irq 29 Device 5 routed to irq 30 Device 5 routed to irq 29 Device 5 routed to irq 27 em0: port 0xfe400000-0xfe40003f mem 0-0x1ffff,0x20000-0x3ffff irq 27 at device 1.0 on pci0 em0: Start: 0x00000000 em0: End: 0x0001FFFF em0: Size: 0x00020000 Fatal kernel mode data abort: 'External Linefetch Abort (P)' trapframe: 0xc00faae0 FSR=00000406, FAR=Invalid, spsr=200000d3 r0 =c00d0400, r1 =cd5bf000, r2 =00000010, r3 =0000000a r4 =c317e008, r5 =cd5bf000, r6 =c00d0400, r7 =c1302070 r8 =c317e008, r9 =c0071180, r10=c317e000, r11=c00fab50 r12=c00fab54, ssp=c00fab2c, slr=c106a96c, pc =c106a968 [thread pid 0 tid 100000 ] Stopped at e1000_init_script_state_82541+0x24c: blx r7 db> The only places i can see owin used is in i80321.c and iq80312.c and they only written to in i80312.c if the controller is a slave not a host: if (!sc->sc_is_host) { sc->sc_owin[0].owin_xlate_lo = sc->sc_iwin[1].iwin_base_lo; sc->sc_owin[0].owin_xlate_hi = sc->sc_iwin[1].iwin_base_hi; } I will see if I can compare the global softc with the ones returned by the get function. Thanks Tom From jhb at freebsd.org Tue Sep 29 20:34:49 2009 From: jhb at freebsd.org (John Baldwin) Date: Tue Sep 29 20:34:56 2009 Subject: sx locks and memory barriers In-Reply-To: <3bbf2fe10909291215i2bdd73aj13c1ac433152cab4@mail.gmail.com> References: <20090924224935.GW473@gandalf.sssup.it> <200909291425.46134.jhb@freebsd.org> <3bbf2fe10909291215i2bdd73aj13c1ac433152cab4@mail.gmail.com> Message-ID: <200909291628.15516.jhb@freebsd.org> On Tuesday 29 September 2009 3:15:40 pm Attilio Rao wrote: > 2009/9/29 John Baldwin : > > On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: > >> 2009/9/25 Fabio Checconi : > >> > Hi all, > >> > looking at sys/sx.h I have some troubles understanding this comment: > >> > > >> > * A note about memory barriers. Exclusive locks need to use the same > >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock > >> > * and _rel when releasing an exclusive lock. On the other side, > >> > * shared lock needs to use an _acq barrier when acquiring the lock > >> > * but, since they don't update any locked data, no memory barrier is > >> > * needed when releasing a shared lock. > >> > > >> > In particular, I'm not understanding what prevents the following sequence > >> > from happening: > >> > > >> > CPU A CPU B > >> > > >> > sx_slock(&data->lock); > >> > > >> > sx_sunlock(&data->lock); > >> > > >> > /* reordered after the unlock > >> > by the cpu */ > >> > if (data->buffer) > >> > sx_xlock(&data->lock); > >> > free(data->buffer); > >> > data->buffer = NULL; > >> > sx_xunlock(&data->lock); > >> > > >> > a = *data->buffer; > >> > > >> > IOW, even if readers do not modify the data protected by the lock, > >> > without a release barrier a memory access may leak past the unlock (as > >> > the cpu won't notice any dependency between the unlock and the fetch, > >> > feeling free to reorder them), thus potentially racing with an exclusive > >> > writer accessing the data. > >> > > >> > On architectures where atomic ops serialize memory accesses this would > >> > never happen, otherwise the sequence above seems possible; am I missing > >> > something? > >> > >> I think your concerns are right, possibly we need this patch: > >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > > > > Actually, since you are only worried about reads, I think this should be > > an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we > > should prefer the cheapest barrier that provides what we need. You would > > still need to keep some language about the memory barriers since using "acq" > > for shared unlocking is different from exclusive unlocking. > > Actually, I don't think that an acq barrier ensures enough protection > against the reordering of 'earlier' operation thus not fixing the > architecture ordering problem reported by Fabio. Also, I don't think > we just have to care about reads (or I don't understand what you mean > here). Hmmm, it might not on certain archs. It does on x86 (i.e. an lfence would work here on x86), but probably not on ia64 and sparc64. Also, we certainly only care about reads. A read/share lock cannot resolve any races where the lock holder is writing data, it can only ensure that the lock holder can safely read shared data without the data changing out from underneath it. > However, I'm not even sure that we have faster read barriers than the > write one. As long as it should be true in theory I don't think that's > what happen in practice. bde@ found that sfence was generally much more expensive than lfence on x86. However, since x86 guarantees the order of all stores we don't actually need to use sfence at all on x86 anyway. -- John Baldwin From attilio at freebsd.org Tue Sep 29 20:39:46 2009 From: attilio at freebsd.org (Attilio Rao) Date: Tue Sep 29 20:39:54 2009 Subject: sx locks and memory barriers In-Reply-To: <200909291628.15516.jhb@freebsd.org> References: <20090924224935.GW473@gandalf.sssup.it> <200909291425.46134.jhb@freebsd.org> <3bbf2fe10909291215i2bdd73aj13c1ac433152cab4@mail.gmail.com> <200909291628.15516.jhb@freebsd.org> Message-ID: <3bbf2fe10909291339s5705a9bendb4c9331293b45a4@mail.gmail.com> 2009/9/29 John Baldwin : > On Tuesday 29 September 2009 3:15:40 pm Attilio Rao wrote: >> 2009/9/29 John Baldwin : >> > On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >> >> 2009/9/25 Fabio Checconi : >> >> > Hi all, >> >> > looking at sys/sx.h I have some troubles understanding this comment: >> >> > >> >> > * A note about memory barriers. Exclusive locks need to use the same >> >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> >> > * and _rel when releasing an exclusive lock. On the other side, >> >> > * shared lock needs to use an _acq barrier when acquiring the lock >> >> > * but, since they don't update any locked data, no memory barrier is >> >> > * needed when releasing a shared lock. >> >> > >> >> > In particular, I'm not understanding what prevents the following sequence >> >> > from happening: >> >> > >> >> > CPU A CPU B >> >> > >> >> > sx_slock(&data->lock); >> >> > >> >> > sx_sunlock(&data->lock); >> >> > >> >> > /* reordered after the unlock >> >> > by the cpu */ >> >> > if (data->buffer) >> >> > sx_xlock(&data->lock); >> >> > free(data->buffer); >> >> > data->buffer = NULL; >> >> > sx_xunlock(&data->lock); >> >> > >> >> > a = *data->buffer; >> >> > >> >> > IOW, even if readers do not modify the data protected by the lock, >> >> > without a release barrier a memory access may leak past the unlock (as >> >> > the cpu won't notice any dependency between the unlock and the fetch, >> >> > feeling free to reorder them), thus potentially racing with an exclusive >> >> > writer accessing the data. >> >> > >> >> > On architectures where atomic ops serialize memory accesses this would >> >> > never happen, otherwise the sequence above seems possible; am I missing >> >> > something? >> >> >> >> I think your concerns are right, possibly we need this patch: >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> > >> > Actually, since you are only worried about reads, I think this should be >> > an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we >> > should prefer the cheapest barrier that provides what we need. You would >> > still need to keep some language about the memory barriers since using "acq" >> > for shared unlocking is different from exclusive unlocking. >> >> Actually, I don't think that an acq barrier ensures enough protection >> against the reordering of 'earlier' operation thus not fixing the >> architecture ordering problem reported by Fabio. Also, I don't think >> we just have to care about reads (or I don't understand what you mean >> here). > > Hmmm, it might not on certain archs. It does on x86 (i.e. an lfence would > work here on x86), but probably not on ia64 and sparc64. Also, we certainly > only care about reads. A read/share lock cannot resolve any races where the > lock holder is writing data, it can only ensure that the lock holder can > safely read shared data without the data changing out from underneath it. > >> However, I'm not even sure that we have faster read barriers than the >> write one. As long as it should be true in theory I don't think that's >> what happen in practice. > > bde@ found that sfence was generally much more expensive than lfence on x86. > However, since x86 guarantees the order of all stores we don't actually need > to use sfence at all on x86 anyway. Yes, x86 guarantees that the stores are strong ordered so I don't think acq to be faster than rel. Can I assume the patch I already sent as reviewed by you and commit then, right? Attilio -- Peace can only be achieved by understanding - A. Einstein From attilio at freebsd.org Tue Sep 29 20:42:15 2009 From: attilio at freebsd.org (Attilio Rao) Date: Tue Sep 29 20:42:23 2009 Subject: sx locks and memory barriers In-Reply-To: <200909291953.36373.max@love2party.net> References: <20090924224935.GW473@gandalf.sssup.it> <3bbf2fe10909290839w305c85c3t1532bd7733c39a6a@mail.gmail.com> <200909291953.36373.max@love2party.net> Message-ID: <3bbf2fe10909291342o4d32e381ge23e446582bb2d18@mail.gmail.com> 2009/9/29 Max Laier : > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: >> 2009/9/25 Fabio Checconi : >> > Hi all, >> > looking at sys/sx.h I have some troubles understanding this comment: >> > >> > * A note about memory barriers. Exclusive locks need to use the same >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> > * and _rel when releasing an exclusive lock. On the other side, >> > * shared lock needs to use an _acq barrier when acquiring the lock >> > * but, since they don't update any locked data, no memory barrier is >> > * needed when releasing a shared lock. >> > >> > In particular, I'm not understanding what prevents the following sequence >> > from happening: >> > >> > CPU A CPU B >> > >> > sx_slock(&data->lock); >> > >> > sx_sunlock(&data->lock); >> > >> > /* reordered after the unlock >> > by the cpu */ >> > if (data->buffer) >> > sx_xlock(&data->lock); >> > free(data->buffer); >> > data->buffer = NULL; >> > sx_xunlock(&data->lock); >> > >> > a = *data->buffer; >> > >> > IOW, even if readers do not modify the data protected by the lock, >> > without a release barrier a memory access may leak past the unlock (as >> > the cpu won't notice any dependency between the unlock and the fetch, >> > feeling free to reorder them), thus potentially racing with an exclusive >> > writer accessing the data. >> > >> > On architectures where atomic ops serialize memory accesses this would >> > never happen, otherwise the sequence above seems possible; am I missing >> > something? >> >> I think your concerns are right, possibly we need this patch: >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> >> However speaking with John we agreed possibly there is a more serious >> breakage. Possibly, memory barriers would also require to ensure the >> compiler to not reorder the operation, while right now, in FreeBSD, they >> just take care of the reordering from the architecture perspective. >> The only way I'm aware of GCC offers that is to clobber memory. >> I will provide a patch that address this soon, hoping that GCC will be >> smart enough to not overhead too much the memory clobbering but just >> try to understand what's our purpose and servers it (I will try to >> compare code generated before and after the patch at least for tier-1 >> architectures). > > Does GCC really reorder accesses to volatile objects? The C Standard seems to > object: > > 5.1.2.3 - 2 > Accessing a volatile object, modifying an object, modifying a file, or calling > a function that does any of those operations are all side effects,11) which > are changes in the state of the execution environment. Evaluation of an > expression may produce side effects. At certain specified points in the > execution sequence called sequence points, all side effects of previous > evaluations shall be complete and no side effects of subsequent evaluations > shall have taken place. (A summary of the sequence points is given in annex > C.) Very interesting. I was thinking about the other operating systems which basically do 'memory clobbering' for ensuring a compiler barrier, but actually they often forsee such a barrier without the conjuction of a memory operand. I think I will need to speak a bit with a GCC engineer in order to see what do they implement in regard of volatile operands. Attilio -- Peace can only be achieved by understanding - A. Einstein From marius at nuenneri.ch Tue Sep 29 20:54:28 2009 From: marius at nuenneri.ch (=?ISO-8859-1?Q?Marius_N=FCnnerich?=) Date: Tue Sep 29 20:54:36 2009 Subject: sx locks and memory barriers In-Reply-To: <3bbf2fe10909291215i2bdd73aj13c1ac433152cab4@mail.gmail.com> References: <20090924224935.GW473@gandalf.sssup.it> <3bbf2fe10909290839w305c85c3t1532bd7733c39a6a@mail.gmail.com> <200909291425.46134.jhb@freebsd.org> <3bbf2fe10909291215i2bdd73aj13c1ac433152cab4@mail.gmail.com> Message-ID: On Tue, Sep 29, 2009 at 21:15, Attilio Rao wrote: > 2009/9/29 John Baldwin : >> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >>> 2009/9/25 Fabio Checconi : >>> > Hi all, >>> > ?looking at sys/sx.h I have some troubles understanding this comment: >>> > >>> > ?* A note about memory barriers. ?Exclusive locks need to use the same >>> > ?* memory barriers as mutexes: _acq when acquiring an exclusive lock >>> > ?* and _rel when releasing an exclusive lock. ?On the other side, >>> > ?* shared lock needs to use an _acq barrier when acquiring the lock >>> > ?* but, since they don't update any locked data, no memory barrier is >>> > ?* needed when releasing a shared lock. >>> > >>> > In particular, I'm not understanding what prevents the following sequence >>> > from happening: >>> > >>> > CPU A ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? CPU B >>> > >>> > sx_slock(&data->lock); >>> > >>> > sx_sunlock(&data->lock); >>> > >>> > /* reordered after the unlock >>> > ? by the cpu */ >>> > if (data->buffer) >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?sx_xlock(&data->lock); >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?free(data->buffer); >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?data->buffer = NULL; >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?sx_xunlock(&data->lock); >>> > >>> > ? ? ? ?a = *data->buffer; >>> > >>> > IOW, even if readers do not modify the data protected by the lock, >>> > without a release barrier a memory access may leak past the unlock (as >>> > the cpu won't notice any dependency between the unlock and the fetch, >>> > feeling free to reorder them), thus potentially racing with an exclusive >>> > writer accessing the data. >>> > >>> > On architectures where atomic ops serialize memory accesses this would >>> > never happen, otherwise the sequence above seems possible; am I missing >>> > something? >>> >>> I think your concerns are right, possibly we need this patch: >>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> >> Actually, since you are only worried about reads, I think this should be >> an "acq" barrier rather than a "rel". ?In some cases "acq" is cheaper, so we >> should prefer the cheapest barrier that provides what we need. ?You would >> still need to keep some language about the memory barriers since using "acq" >> for shared unlocking is different from exclusive unlocking. > > Actually, I don't think that an acq barrier ensures enough protection > against the reordering of 'earlier' operation thus not fixing the > architecture ordering problem reported by Fabio. Also, I don't think > we just have to care about reads (or ?I don't understand what you mean > here). > However, I'm not even sure that we have faster read barriers than the > write one. As long as it should be true in theory I don't think that's > what happen in practice. > >> The memory clobber is quite heavyweight. ?It actually forces gcc to forget any >> cached memory items in registers and reload everything again. ?What I really >> want is just a barrier to tell GCC to not reorder things. ?If I read a value >> in the program before acquiring a lock it is in theory fine to keep that >> cached across the barrier. ?However, there isn't a way to do this sort of >> thing with GCC currently. > > Yes, that's the only tool we have right now with GCC. I will try to > look for another way, but it sounds difficult to discover. Even if we would have a mechanism to tell GCC to not reorder the instructions the CPU itself would still be free to reorder if there are no barriers. Or am I missing something? From attilio at freebsd.org Tue Sep 29 20:58:22 2009 From: attilio at freebsd.org (Attilio Rao) Date: Tue Sep 29 20:58:29 2009 Subject: sx locks and memory barriers In-Reply-To: References: <20090924224935.GW473@gandalf.sssup.it> <3bbf2fe10909290839w305c85c3t1532bd7733c39a6a@mail.gmail.com> <200909291425.46134.jhb@freebsd.org> <3bbf2fe10909291215i2bdd73aj13c1ac433152cab4@mail.gmail.com> Message-ID: <3bbf2fe10909291358q3063f763md4ccba88c3b1d0be@mail.gmail.com> 2009/9/29 Marius N?nnerich : > On Tue, Sep 29, 2009 at 21:15, Attilio Rao wrote: >> 2009/9/29 John Baldwin : >>> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: >>>> 2009/9/25 Fabio Checconi : >>>> > Hi all, >>>> > looking at sys/sx.h I have some troubles understanding this comment: >>>> > >>>> > * A note about memory barriers. Exclusive locks need to use the same >>>> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >>>> > * and _rel when releasing an exclusive lock. On the other side, >>>> > * shared lock needs to use an _acq barrier when acquiring the lock >>>> > * but, since they don't update any locked data, no memory barrier is >>>> > * needed when releasing a shared lock. >>>> > >>>> > In particular, I'm not understanding what prevents the following sequence >>>> > from happening: >>>> > >>>> > CPU A CPU B >>>> > >>>> > sx_slock(&data->lock); >>>> > >>>> > sx_sunlock(&data->lock); >>>> > >>>> > /* reordered after the unlock >>>> > by the cpu */ >>>> > if (data->buffer) >>>> > sx_xlock(&data->lock); >>>> > free(data->buffer); >>>> > data->buffer = NULL; >>>> > sx_xunlock(&data->lock); >>>> > >>>> > a = *data->buffer; >>>> > >>>> > IOW, even if readers do not modify the data protected by the lock, >>>> > without a release barrier a memory access may leak past the unlock (as >>>> > the cpu won't notice any dependency between the unlock and the fetch, >>>> > feeling free to reorder them), thus potentially racing with an exclusive >>>> > writer accessing the data. >>>> > >>>> > On architectures where atomic ops serialize memory accesses this would >>>> > never happen, otherwise the sequence above seems possible; am I missing >>>> > something? >>>> >>>> I think your concerns are right, possibly we need this patch: >>>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >>> >>> Actually, since you are only worried about reads, I think this should be >>> an "acq" barrier rather than a "rel". In some cases "acq" is cheaper, so we >>> should prefer the cheapest barrier that provides what we need. You would >>> still need to keep some language about the memory barriers since using "acq" >>> for shared unlocking is different from exclusive unlocking. >> >> Actually, I don't think that an acq barrier ensures enough protection >> against the reordering of 'earlier' operation thus not fixing the >> architecture ordering problem reported by Fabio. Also, I don't think >> we just have to care about reads (or I don't understand what you mean >> here). >> However, I'm not even sure that we have faster read barriers than the >> write one. As long as it should be true in theory I don't think that's >> what happen in practice. >> >>> The memory clobber is quite heavyweight. It actually forces gcc to forget any >>> cached memory items in registers and reload everything again. What I really >>> want is just a barrier to tell GCC to not reorder things. If I read a value >>> in the program before acquiring a lock it is in theory fine to keep that >>> cached across the barrier. However, there isn't a way to do this sort of >>> thing with GCC currently. >> >> Yes, that's the only tool we have right now with GCC. I will try to >> look for another way, but it sounds difficult to discover. > > Even if we would have a mechanism to tell GCC to not reorder the > instructions the CPU itself would still be free to reorder if there > are no barriers. Or am I missing something? Our code already takes care of that case in our barriers. Attilio -- Peace can only be achieved by understanding - A. Einstein From tinguely at casselton.net Tue Sep 29 19:21:20 2009 From: tinguely at casselton.net (Mark Tinguely) Date: Tue Sep 29 21:08:54 2009 Subject: Help debugging: Fatal kernel mode data abort: 'External Linefetch Abort (P)' In-Reply-To: <4AC21F44.6060004@tomjudge.com> Message-ID: <200909291908.n8TJ8UA2042239@casselton.net> I don't know anything about the code other than what I read today ... It appears from you boot traces the owin[0].owin_xlate_[lo | hi] values should be fine in iq80321.c - an "VERBOSE_INIT_ARM" would confirm it. You might want to test if the "sc" pointer in iq80321.c has the same value as the global "i80321_softc" pointer. You can add those print statements to an "VERBOSE_INIT_ARM". It will tell you if something changed the global pointer or if something overwrote the owin values in the structure. If global pointer or owin was changed before the pci attach code, you can put the appropriate test into the earlier (obio, uart, itimer, iopwdtimer) attach. None of these attaches use the global "i80321_softc" pointer. --Mark. From jhb at freebsd.org Tue Sep 29 21:32:33 2009 From: jhb at freebsd.org (John Baldwin) Date: Tue Sep 29 21:32:47 2009 Subject: sx locks and memory barriers In-Reply-To: References: <20090924224935.GW473@gandalf.sssup.it> <3bbf2fe10909291215i2bdd73aj13c1ac433152cab4@mail.gmail.com> Message-ID: <200909291721.27755.jhb@freebsd.org> On Tuesday 29 September 2009 4:26:56 pm Marius N?nnerich wrote: > On Tue, Sep 29, 2009 at 21:15, Attilio Rao wrote: > > 2009/9/29 John Baldwin : > >> On Tuesday 29 September 2009 11:39:37 am Attilio Rao wrote: > >>> 2009/9/25 Fabio Checconi : > >>> > Hi all, > >>> > ?looking at sys/sx.h I have some troubles understanding this comment: > >>> > > >>> > ?* A note about memory barriers. ?Exclusive locks need to use the same > >>> > ?* memory barriers as mutexes: _acq when acquiring an exclusive lock > >>> > ?* and _rel when releasing an exclusive lock. ?On the other side, > >>> > ?* shared lock needs to use an _acq barrier when acquiring the lock > >>> > ?* but, since they don't update any locked data, no memory barrier is > >>> > ?* needed when releasing a shared lock. > >>> > > >>> > In particular, I'm not understanding what prevents the following sequence > >>> > from happening: > >>> > > >>> > CPU A ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? CPU B > >>> > > >>> > sx_slock(&data->lock); > >>> > > >>> > sx_sunlock(&data->lock); > >>> > > >>> > /* reordered after the unlock > >>> > ? by the cpu */ > >>> > if (data->buffer) > >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?sx_xlock(&data->lock); > >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?free(data->buffer); > >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?data->buffer = NULL; > >>> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?sx_xunlock(&data->lock); > >>> > > >>> > ? ? ? ?a = *data->buffer; > >>> > > >>> > IOW, even if readers do not modify the data protected by the lock, > >>> > without a release barrier a memory access may leak past the unlock (as > >>> > the cpu won't notice any dependency between the unlock and the fetch, > >>> > feeling free to reorder them), thus potentially racing with an exclusive > >>> > writer accessing the data. > >>> > > >>> > On architectures where atomic ops serialize memory accesses this would > >>> > never happen, otherwise the sequence above seems possible; am I missing > >>> > something? > >>> > >>> I think your concerns are right, possibly we need this patch: > >>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > >> > >> Actually, since you are only worried about reads, I think this should be > >> an "acq" barrier rather than a "rel". ?In some cases "acq" is cheaper, so we > >> should prefer the cheapest barrier that provides what we need. ?You would > >> still need to keep some language about the memory barriers since using "acq" > >> for shared unlocking is different from exclusive unlocking. > > > > Actually, I don't think that an acq barrier ensures enough protection > > against the reordering of 'earlier' operation thus not fixing the > > architecture ordering problem reported by Fabio. Also, I don't think > > we just have to care about reads (or ?I don't understand what you mean > > here). > > However, I'm not even sure that we have faster read barriers than the > > write one. As long as it should be true in theory I don't think that's > > what happen in practice. > > > >> The memory clobber is quite heavyweight. ?It actually forces gcc to forget any > >> cached memory items in registers and reload everything again. ?What I really > >> want is just a barrier to tell GCC to not reorder things. ?If I read a value > >> in the program before acquiring a lock it is in theory fine to keep that > >> cached across the barrier. ?However, there isn't a way to do this sort of > >> thing with GCC currently. > > > > Yes, that's the only tool we have right now with GCC. I will try to > > look for another way, but it sounds difficult to discover. > > Even if we would have a mechanism to tell GCC to not reorder the > instructions the CPU itself would still be free to reorder if there > are no barriers. Or am I missing something? No, the thing to do here for the second part is add "memory" clobbers to the existing atomic ops with barriers. It will still require barriers for them to be enforced. -- John Baldwin From jhb at freebsd.org Tue Sep 29 21:32:34 2009 From: jhb at freebsd.org (John Baldwin) Date: Tue Sep 29 21:32:48 2009 Subject: sx locks and memory barriers In-Reply-To: <3bbf2fe10909291342o4d32e381ge23e446582bb2d18@mail.gmail.com> References: <20090924224935.GW473@gandalf.sssup.it> <200909291953.36373.max@love2party.net> <3bbf2fe10909291342o4d32e381ge23e446582bb2d18@mail.gmail.com> Message-ID: <200909291731.32394.jhb@freebsd.org> On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote: > 2009/9/29 Max Laier : > > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: > >> 2009/9/25 Fabio Checconi : > >> > Hi all, > >> > looking at sys/sx.h I have some troubles understanding this comment: > >> > > >> > * A note about memory barriers. Exclusive locks need to use the same > >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock > >> > * and _rel when releasing an exclusive lock. On the other side, > >> > * shared lock needs to use an _acq barrier when acquiring the lock > >> > * but, since they don't update any locked data, no memory barrier is > >> > * needed when releasing a shared lock. > >> > > >> > In particular, I'm not understanding what prevents the following sequence > >> > from happening: > >> > > >> > CPU A CPU B > >> > > >> > sx_slock(&data->lock); > >> > > >> > sx_sunlock(&data->lock); > >> > > >> > /* reordered after the unlock > >> > by the cpu */ > >> > if (data->buffer) > >> > sx_xlock(&data->lock); > >> > free(data->buffer); > >> > data->buffer = NULL; > >> > sx_xunlock(&data->lock); > >> > > >> > a = *data->buffer; > >> > > >> > IOW, even if readers do not modify the data protected by the lock, > >> > without a release barrier a memory access may leak past the unlock (as > >> > the cpu won't notice any dependency between the unlock and the fetch, > >> > feeling free to reorder them), thus potentially racing with an exclusive > >> > writer accessing the data. > >> > > >> > On architectures where atomic ops serialize memory accesses this would > >> > never happen, otherwise the sequence above seems possible; am I missing > >> > something? > >> > >> I think your concerns are right, possibly we need this patch: > >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > >> > >> However speaking with John we agreed possibly there is a more serious > >> breakage. Possibly, memory barriers would also require to ensure the > >> compiler to not reorder the operation, while right now, in FreeBSD, they > >> just take care of the reordering from the architecture perspective. > >> The only way I'm aware of GCC offers that is to clobber memory. > >> I will provide a patch that address this soon, hoping that GCC will be > >> smart enough to not overhead too much the memory clobbering but just > >> try to understand what's our purpose and servers it (I will try to > >> compare code generated before and after the patch at least for tier-1 > >> architectures). > > > > Does GCC really reorder accesses to volatile objects? The C Standard seems to > > object: > > > > 5.1.2.3 - 2 > > Accessing a volatile object, modifying an object, modifying a file, or calling > > a function that does any of those operations are all side effects,11) which > > are changes in the state of the execution environment. Evaluation of an > > expression may produce side effects. At certain specified points in the > > execution sequence called sequence points, all side effects of previous > > evaluations shall be complete and no side effects of subsequent evaluations > > shall have taken place. (A summary of the sequence points is given in annex > > C.) > > Very interesting. > I was thinking about the other operating systems which basically do > 'memory clobbering' for ensuring a compiler barrier, but actually they > often forsee such a barrier without the conjuction of a memory > operand. > > I think I will need to speak a bit with a GCC engineer in order to see > what do they implement in regard of volatile operands. GCC can be quite aggressive with reordering even in the face of volatile. I was recently doing a hack to export some data from the kernel to userland that used a spin loop to grab a snapshot of the contents of a structure similar to the method used in the kernel with the timehands structures. It used a volatile structure exposed from the kernel that looked something like: struct foo { volatile int gen; /* other stuff */ }; volatile struct foo *p; do { x = p->gen; /* read other stuff */ y = p->gen; } while (x != y && x != 0); GCC moved the 'y = ' up into the middle of the '/* read other stuff */'. I eventually had to add explicit "memory" clobbers to force GCC to not move the reads of 'gen' around but do them "around" all the other operations, so that the working code is: do { x = p->gen; asm volatile("" ::: "memory"); /* read other stuff */ asm volatile("" ::: "memory"); y = p->gen; } while (x != y && x != 0); -- John Baldwin From attilio at freebsd.org Tue Sep 29 21:39:46 2009 From: attilio at freebsd.org (Attilio Rao) Date: Tue Sep 29 21:39:53 2009 Subject: sx locks and memory barriers In-Reply-To: <200909291731.32394.jhb@freebsd.org> References: <20090924224935.GW473@gandalf.sssup.it> <200909291953.36373.max@love2party.net> <3bbf2fe10909291342o4d32e381ge23e446582bb2d18@mail.gmail.com> <200909291731.32394.jhb@freebsd.org> Message-ID: <3bbf2fe10909291439x21f53e34n60d63554b1dea0de@mail.gmail.com> 2009/9/29 John Baldwin : > On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote: >> 2009/9/29 Max Laier : >> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: >> >> 2009/9/25 Fabio Checconi : >> >> > Hi all, >> >> > looking at sys/sx.h I have some troubles understanding this comment: >> >> > >> >> > * A note about memory barriers. Exclusive locks need to use the same >> >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock >> >> > * and _rel when releasing an exclusive lock. On the other side, >> >> > * shared lock needs to use an _acq barrier when acquiring the lock >> >> > * but, since they don't update any locked data, no memory barrier is >> >> > * needed when releasing a shared lock. >> >> > >> >> > In particular, I'm not understanding what prevents the following sequence >> >> > from happening: >> >> > >> >> > CPU A CPU B >> >> > >> >> > sx_slock(&data->lock); >> >> > >> >> > sx_sunlock(&data->lock); >> >> > >> >> > /* reordered after the unlock >> >> > by the cpu */ >> >> > if (data->buffer) >> >> > sx_xlock(&data->lock); >> >> > free(data->buffer); >> >> > data->buffer = NULL; >> >> > sx_xunlock(&data->lock); >> >> > >> >> > a = *data->buffer; >> >> > >> >> > IOW, even if readers do not modify the data protected by the lock, >> >> > without a release barrier a memory access may leak past the unlock (as >> >> > the cpu won't notice any dependency between the unlock and the fetch, >> >> > feeling free to reorder them), thus potentially racing with an exclusive >> >> > writer accessing the data. >> >> > >> >> > On architectures where atomic ops serialize memory accesses this would >> >> > never happen, otherwise the sequence above seems possible; am I missing >> >> > something? >> >> >> >> I think your concerns are right, possibly we need this patch: >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >> >> >> >> However speaking with John we agreed possibly there is a more serious >> >> breakage. Possibly, memory barriers would also require to ensure the >> >> compiler to not reorder the operation, while right now, in FreeBSD, they >> >> just take care of the reordering from the architecture perspective. >> >> The only way I'm aware of GCC offers that is to clobber memory. >> >> I will provide a patch that address this soon, hoping that GCC will be >> >> smart enough to not overhead too much the memory clobbering but just >> >> try to understand what's our purpose and servers it (I will try to >> >> compare code generated before and after the patch at least for tier-1 >> >> architectures). >> > >> > Does GCC really reorder accesses to volatile objects? The C Standard seems to >> > object: >> > >> > 5.1.2.3 - 2 >> > Accessing a volatile object, modifying an object, modifying a file, or calling >> > a function that does any of those operations are all side effects,11) which >> > are changes in the state of the execution environment. Evaluation of an >> > expression may produce side effects. At certain specified points in the >> > execution sequence called sequence points, all side effects of previous >> > evaluations shall be complete and no side effects of subsequent evaluations >> > shall have taken place. (A summary of the sequence points is given in annex >> > C.) >> >> Very interesting. >> I was thinking about the other operating systems which basically do >> 'memory clobbering' for ensuring a compiler barrier, but actually they >> often forsee such a barrier without the conjuction of a memory >> operand. >> >> I think I will need to speak a bit with a GCC engineer in order to see >> what do they implement in regard of volatile operands. > > GCC can be quite aggressive with reordering even in the face of volatile. I > was recently doing a hack to export some data from the kernel to userland > that used a spin loop to grab a snapshot of the contents of a structure > similar to the method used in the kernel with the timehands structures. It > used a volatile structure exposed from the kernel that looked something > like: > > struct foo { > volatile int gen; > /* other stuff */ > }; > > volatile struct foo *p; > > do { > x = p->gen; > /* read other stuff */ > y = p->gen; > } while (x != y && x != 0); > > GCC moved the 'y = ' up into the middle of the '/* read other stuff */'. > I eventually had to add explicit "memory" clobbers to force GCC to not > move the reads of 'gen' around but do them "around" all the other > operations, so that the working code is: > > do { > x = p->gen; > asm volatile("" ::: "memory"); > /* read other stuff */ > asm volatile("" ::: "memory"); > y = p->gen; > } while (x != y && x != 0); > I see. So probabilly clobbering memory is the only choice we have right now. I will try to make a patch which also keeps into account the possibility to skip it (or define by hand alternative approaches) for different compilers. I wonder, specifically, how llvm/clang relies with it. Attilio -- Peace can only be achieved by understanding - A. Einstein From jhb at freebsd.org Wed Sep 30 13:07:28 2009 From: jhb at freebsd.org (John Baldwin) Date: Wed Sep 30 13:07:42 2009 Subject: sx locks and memory barriers In-Reply-To: <3bbf2fe10909291439x21f53e34n60d63554b1dea0de@mail.gmail.com> References: <20090924224935.GW473@gandalf.sssup.it> <200909291731.32394.jhb@freebsd.org> <3bbf2fe10909291439x21f53e34n60d63554b1dea0de@mail.gmail.com> Message-ID: <200909300759.29141.jhb@freebsd.org> On Tuesday 29 September 2009 5:39:43 pm Attilio Rao wrote: > 2009/9/29 John Baldwin : > > On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote: > >> 2009/9/29 Max Laier : > >> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: > >> >> 2009/9/25 Fabio Checconi : > >> >> > Hi all, > >> >> > looking at sys/sx.h I have some troubles understanding this comment: > >> >> > > >> >> > * A note about memory barriers. Exclusive locks need to use the same > >> >> > * memory barriers as mutexes: _acq when acquiring an exclusive lock > >> >> > * and _rel when releasing an exclusive lock. On the other side, > >> >> > * shared lock needs to use an _acq barrier when acquiring the lock > >> >> > * but, since they don't update any locked data, no memory barrier is > >> >> > * needed when releasing a shared lock. > >> >> > > >> >> > In particular, I'm not understanding what prevents the following sequence > >> >> > from happening: > >> >> > > >> >> > CPU A CPU B > >> >> > > >> >> > sx_slock(&data->lock); > >> >> > > >> >> > sx_sunlock(&data->lock); > >> >> > > >> >> > /* reordered after the unlock > >> >> > by the cpu */ > >> >> > if (data->buffer) > >> >> > sx_xlock(&data->lock); > >> >> > free(data->buffer); > >> >> > data->buffer = NULL; > >> >> > sx_xunlock(&data->lock); > >> >> > > >> >> > a = *data->buffer; > >> >> > > >> >> > IOW, even if readers do not modify the data protected by the lock, > >> >> > without a release barrier a memory access may leak past the unlock (as > >> >> > the cpu won't notice any dependency between the unlock and the fetch, > >> >> > feeling free to reorder them), thus potentially racing with an exclusive > >> >> > writer accessing the data. > >> >> > > >> >> > On architectures where atomic ops serialize memory accesses this would > >> >> > never happen, otherwise the sequence above seems possible; am I missing > >> >> > something? > >> >> > >> >> I think your concerns are right, possibly we need this patch: > >> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff > >> >> > >> >> However speaking with John we agreed possibly there is a more serious > >> >> breakage. Possibly, memory barriers would also require to ensure the > >> >> compiler to not reorder the operation, while right now, in FreeBSD, they > >> >> just take care of the reordering from the architecture perspective. > >> >> The only way I'm aware of GCC offers that is to clobber memory. > >> >> I will provide a patch that address this soon, hoping that GCC will be > >> >> smart enough to not overhead too much the memory clobbering but just > >> >> try to understand what's our purpose and servers it (I will try to > >> >> compare code generated before and after the patch at least for tier-1 > >> >> architectures). > >> > > >> > Does GCC really reorder accesses to volatile objects? The C Standard seems to > >> > object: > >> > > >> > 5.1.2.3 - 2 > >> > Accessing a volatile object, modifying an object, modifying a file, or calling > >> > a function that does any of those operations are all side effects,11) which > >> > are changes in the state of the execution environment. Evaluation of an > >> > expression may produce side effects. At certain specified points in the > >> > execution sequence called sequence points, all side effects of previous > >> > evaluations shall be complete and no side effects of subsequent evaluations > >> > shall have taken place. (A summary of the sequence points is given in annex > >> > C.) > >> > >> Very interesting. > >> I was thinking about the other operating systems which basically do > >> 'memory clobbering' for ensuring a compiler barrier, but actually they > >> often forsee such a barrier without the conjuction of a memory > >> operand. > >> > >> I think I will need to speak a bit with a GCC engineer in order to see > >> what do they implement in regard of volatile operands. > > > > GCC can be quite aggressive with reordering even in the face of volatile. I > > was recently doing a hack to export some data from the kernel to userland > > that used a spin loop to grab a snapshot of the contents of a structure > > similar to the method used in the kernel with the timehands structures. It > > used a volatile structure exposed from the kernel that looked something > > like: > > > > struct foo { > > volatile int gen; > > /* other stuff */ > > }; > > > > volatile struct foo *p; > > > > do { > > x = p->gen; > > /* read other stuff */ > > y = p->gen; > > } while (x != y && x != 0); > > > > GCC moved the 'y = ' up into the middle of the '/* read other stuff */'. > > I eventually had to add explicit "memory" clobbers to force GCC to not > > move the reads of 'gen' around but do them "around" all the other > > operations, so that the working code is: > > > > do { > > x = p->gen; > > asm volatile("" ::: "memory"); > > /* read other stuff */ > > asm volatile("" ::: "memory"); > > y = p->gen; > > } while (x != y && x != 0); > > > > I see. > So probabilly clobbering memory is the only choice we have right now. > I will try to make a patch which also keeps into account the > possibility to skip it (or define by hand alternative approaches) for > different compilers. > I wonder, specifically, how llvm/clang relies with it. We already allow for different compilers to defined different versions of atomic_*(). I think all you need to do for now is just add "memory" clobbers to all of the atomic operations that have either an _acq or _rel memory barrier. -- John Baldwin