DFLTPHYS vs MAXPHYS
Alexander Motin
mav at FreeBSD.org
Sun Jul 5 17:12:16 UTC 2009
Bruce Evans wrote:
> On Sun, 5 Jul 2009, Alexander Motin wrote:
>> Bruce Evans wrote:
>>> On Sun, 5 Jul 2009, Alexander Motin wrote:
>>> 64K is large enough to bust modern L1 caches and old L2 caches. Make
>>> the
>>> size bigger to bust modern L2 caches too. Interrupt rates don't matter
>>> when you are transfering 64K items per interrupt.
>>
>> How cache size related to it, if DMA transfers data directly to RAM?
>> Sure, CPU will invalidate related cache lines, but why it should
>> invalidate everything?
>
> I was thinking more of transfers to userland. Increasing user buffer
> sizes above about half the L2 cache size guarantees busting the L2
> cache, if the application actually looks at all of its data. If the
> data is read using read(), then the L2 cache will be busted twice (or
> a bit less with nontemporal copying), first by copying out the data
> and then by looking at it. If the data is read using mmap(), then the
> L2 cache will only be busted once. This effect has always been very
> noticeable using dd. Larger buffer sizes are also bad for latency.
>
>> Small transfers give more work to all levels from GEOM down to
>> CAM/ATA, controllers and drives. It is not just a context switching.
>
> Yes, I can't see any cache busting below the level of copyout(). Also,
> after you convert all applications to use mmap() instead of read(),
> the cache busting should become per-CPU.
As soon as file data usually passing via buffer cache, they will anyway
be read to the different memory areas and copied-out from them. So I
don't see much difference there between doing single big and several
small transactions. Cache trashing by user-level also will depends only
on user-level application buffer size, but not on kernel.
How to reproduce that dd experiment? I have my system running with
MAXPHYS of 512K and here is what I have:
# dd if=/dev/ada0 of=/dev/null bs=512k count=1000
1000+0 records in
1000+0 records out
524288000 bytes transferred in 2.471564 secs (212128024 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=256k count=2000
2000+0 records in
2000+0 records out
524288000 bytes transferred in 2.666643 secs (196609752 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=128k count=4000
4000+0 records in
4000+0 records out
524288000 bytes transferred in 2.759498 secs (189993969 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=64k count=8000
8000+0 records in
8000+0 records out
524288000 bytes transferred in 2.718900 secs (192830927 bytes/sec)
CPU load instead grows from 10% at 512K to 15% at 64K. May be trashing
effect will only be noticeable at block comparable to cache size, but
modern CPUs have megabytes of cache.
--
Alexander Motin
More information about the freebsd-arch
mailing list