[PATCH] Support for large number of md(4) disks
sobomax at portaone.com
Wed Jan 18 21:25:25 PST 2006
Wojciech A. Koszek wrote:
> On Mon, Jan 16, 2006 at 01:56:29PM -0800, Maxim Sobolev wrote:
> Hi Maxim,
>> IMHO there is better approach to fetch unknown amount of data from the
>> kernel using ioctl(2) facility. The main idea is that you allocate some
>> buffer of size sufficient in 95% of cases (for md(4) I think 8-16
>> entries are enough), attach it to some structure which has size of the
>> buffer as one of its members and send pointer to that structure as an
>> argument to ioctl(2).
>> Upon receiving this structure the kernel compares size of the buffer
>> with amount of information that it needs to send back. If buffer size is
>> sufficient to hold this information it copies it out and returns number
>> of entries in the buffer as one of members of this structure.
> I don't like using array member for holding additional data. We have
> something similar right now with md_pad. I wanted to prevent us from
> doing it once again. To do it right, we'd have to add yet another
> structure describing size of list with pointer to list of disks and the
> other one for describing separate disks.. but 
>> If the buffer size is insufficient, the kernel fills in desired size of
>> the buffer in structure members and returns some error code indicating
>> that the provided buffer is insufficient. Upon receiving this error
>> userland increases the buffer size to the size suggested by the kernel
>> (perhaps adding some extra space) and repeats the ioctl(2) calls.
> I belive both methods are acceptable since we always end up with
> sysctl(3)-like problem. Solution you've described will give us one
> ioctl() call in possitive case, but are there any others advantages?
Yes, there is a difference. I don't like your approach when you are
trying to win the race fixed amount of times (5) and then just bailing
out, asymptotic approach is better IMHO. Especially considering that
memory is cheap nowadays and you won't have any problems with allocating
space for many thousand configuration entries, even in the case when you
are really going to use only few of them.
Regarding you assumption that meeting the situation when total number of
devices changes quickly I don't quite agree. A simple script can make
number of md(4) devices going up/down by few hundred per second easily,
your approach will behave erratically in such case.
>  cases in which total device number will change are as probable as
> using more than 100 md(4) disks ;-) This is why I decided to use simple
> request for a size and to do a request for md(4) list.
More information about the freebsd-current