svn commit: r303746 - head/usr.bin/indent

Sun Aug 7 08:20:40 UTC 2016

On 2016-08-06 22:18, Pedro Giffuni wrote:
> On 06/08/2016 15:13, Ed Schouten wrote:
>> 2016-08-04 17:27 GMT+02:00 Pedro F. Giffuni <pfg at freebsd.org>:
>>> Log:
>>>    indent(1): Use bsearch() for looking up type keywords.
>> You're never doing any deletions, right? Would it make more sense to
>> use hsearch_r() in that case?
>>
> Indeed, good idea, although it may be an issue if portability to Windows
> is desired.

I think I could have done better job describing the reasons behind that 
commit, because there are two of those.

The main thing is that I wanted to get rid of arbitrary limit of 16384 
"specials" (keywords, including typedef names) -- and the 
malloc/realloc/bsearch based implementation of looking up type keywords 
solved that, while being relatively simple and easy to follow.

It was also very nice to notice a significant performance improvement 
since I would re-indent whole src/ directory of the PostgreSQL project 
many times while testing changes I made. But it's not that important 
after all, because PostgreSQL sources get re-indented very rarely.

Having said that, I felt compelled to compare performance of bsearch() 
to that of hsearch().

First thing to say here is that I did the testing on a Linux+glibc 
platform, because 1) that's my main development platform, and 2) it's 
probably the only one where anybody re-indents hundreds of thousands of 
lines of code with indent(1) in one go, often, and multiple times a day 
-- and therefore can actually notice the difference.

Secondly, while hsearch_r() being a GNU extension is an argument against 
using it, in this particular case there would be no advantage to be 
gained by preferring the re-entrant version, I believe -- which renders 
the lower portability argument moot. So what I tested was hsearch().

And my implementation of hsearch() based typedef name lookup (diff 
attached) wasn't significantly faster than bsearch(). Obviously I tested 
the glibc version and perhaps FreeBSD hsearch() is faster, but I don't 
expect it to be significantly faster.

The most important conclusion, though, is that at least the glibc 
implementation imposes an arbitrary limit on how many items you can add 
to the data structure:
	The argument nel specifies the maximum number of entries in the table. 
(This  maximum  cannot be changed later, so choose it wisely.)
Adding new items to the data structure (or searching them - I didn't 
investigate) actually _stopped working_ when approximately /nel/ number 
of elements was reached (I accidentally set it to 2000 while PostgreSQL 
has nearly 3000 typedef names). I don't know if FreeBSD's implementation 
does the same, but it's a show-stopper to me anyway, so personally I 
won't be pursuing this idea of replacing bsearch() with hsearch().

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lexi.c.diff
Type: text/x-patch
Size: 2121 bytes
Desc: lexi.c.diff
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20160807/f038d87c/attachment.bin>