svn commit: r221853 - in head/sys: dev/md dev/null sys vm
Bruce Evans
brde at optusnet.com.au
Mon May 30 17:52:55 UTC 2011
On Mon, 30 May 2011 mdf at FreeBSD.org wrote:
> On Mon, May 30, 2011 at 8:25 AM, Bruce Evans <brde at optusnet.com.au> wrote:
>> On Sat, 28 May 2011 mdf at FreeBSD.org wrote:
>>> ...
>>> Meanwhile you could try setting ZERO_REGION_SIZE to PAGE_SIZE and I
>>> think that will restore things to the original performance.
>>
>> Using /dev/zero always thrashes caches by the amount <source buffer
>> size> + <target buffer size> (unless the arch uses nontemporal memory
>> accesses for uiomove, which none do AFAIK). So a large source buffer
>> is always just a pessimization. A large target buffer size is also a
>> pessimization, but for the target buffer a fairly large size is needed
>> to amortize the large syscall costs. In this PR, the target buffer
>> size is 64K. ZERO_REGION_SIZE is 64K on i386 and 2M on amd64. 64K+64K
>> on i386 is good for thrashing the L1 cache.
>
> That depends -- is the cache virtually or physically addressed? The
> zero_region only has 4k (PAGE_SIZE) of unique physical addresses. So
> most of the cache thrashing is due to the user-space buffer, if the
> cache is physically addressed.
Oops. I now remember thinking that the much larger source buffer would be
OK since it only uses 1 physical page. But it is apparently virtually
addressed.
> It will only have a
>> noticeable impact on a current L2 cache in competition with other
>> threads. It is hard to fit everything in the L1 cache even with
>> non-bloated buffer sizes and 1 thread (16 for the source (I)cache, 0
>> for the source (D)cache and 4K for the target cache might work). On
>> amd64, 2M+2M is good for thrashing most L2 caches. In this PR, the
>> thrashing is limited by the target buffer size to about 64K+64K, up
>> from 4K+64K, and it is marginal whether the extra thrashing from the
>> larger source buffer makes much difference.
>>
>> The old zbuf source buffer size of PAGE_SIZE was already too large.
>
> Wouldn't this depend on how far down from the use of the buffer the
> actual copy happens? Another advantage to a large virtual buffer is
> that it reduces the number of times the copy loop in uiomove has to
> return up to the device layer that initiated the copy. This is all
> pretty fast, but again assuming a physical cache fewer trips is
> better.
Yes, I had forgotten that I have to keep going back to the uiomove()
level for each iteration. That's a lot of overhead although not nearly
as much as going back to the user level. If this is actually important
to optimize, then I might add a repeat count to uiomove() and copyout()
(actually a different function for the latter).
linux-2.6.10 uses a mmapped /dev/zero and has had this since Y2K
according to its comment. Sigh. You will never beat that by copying,
but I think mmapping /dev/zero is only much more optimal for silly
benchmarks.
linux-2.6.10 also has a seekable /dev/zero. Seeks don't really work,
but some of them "succeed" and keep the offset at 0 . ISTR remember
a FreeBSD PR about the file offset for /dev/zero not "working" because
it is garbage instead of 0. It is clearly a Linuxism to depend on it
being nonzero. IIRC, the file offset for device files is at best
implementation-defined in POSIX.
Bruce
More information about the svn-src-head
mailing list