you are in an fs with millions of small files

Charles Swiger cswiger at mac.com
Wed Jun 8 16:06:45 GMT 2005


On Jun 8, 2005, at 3:52 AM, Colin Percival wrote:
> Giorgos Keramidas wrote:
>> On 2005-06-08 09:25, Dag-Erling Sm?rgrav <des at des.no> wrote:
>>> That's because fts's sorting code is brain-dead.  It starts by  
>>> reading
>>> the entire directory into a linked list, then copies that list  
>>> into an
>>> array which it passes to qsort(), and finally converts the array  
>>> back
>>> into a linked list.
>>
>> Is there a better way to sort a linked list
>
> How do you define "better"?

Most people think "better" means "it runs faster".  :-)

> You can merge-sort a singly-linked list quite easily, but converting
> it to an array and back would probably be faster.

Ugh.  Converting a list to an array and back simply to sort requires  
roughly twice the working memory, and the work you're doing to  
accomplish this conversion is time spent not actually solving  
anything useful.

It's best to pick one data structure based on what the task requires  
and what the access patterns are.  If you know that the number of  
elements is fixed and will never change, arrays are fine, but if the  
# of elements changes dynamicly, something like a list or red-black  
tree is a better bet.  A list is more suitable for linear traversal,  
whereas a RB-tree handles random access  better.

qsort() is wonderful, but as the size of individual data elements  
grows, the overhead of copying them around in the array becomes large  
enough that you are better off rearranging list pointers rather than  
using an array representation.  A long time ago, the break-even point  
for using lists rather than arrays was somewhere around sizeof(el) >  
64, but it's been a while....

-- 
-Chuck



More information about the freebsd-fs mailing list