ffmpeg & ULE
Pieter de Goeje
pieter at degoeje.nl
Wed Oct 19 01:20:17 UTC 2011
On Tuesday 18 October 2011 11:04:36 Urmas Lett wrote:
> Hello.
>
> Why is ffmpeg -threads massively slower with ULE than 4BSD?
>
> ffmpeg preset veryfast with sched_bsd:
> real 1m49.407s
> user 6m53.932s
> sys 0m1.700s
>
> ffmpeg preset veryfast with sched_ule:
> real 2m52.711s
> user 6m50.310s
> sys 0m1.582s
>
> #uname -a
> FreeBSD 9.0-RC1 FreeBSD 9.0-RC1 #0: Mon Oct 17 20:32:29 EEST
Since no-one has offered any insight about the cause (yet) I'll explain what I
think is happening here.
SCHED_ULE tries to make threads "sticky" to a certain CPU. This has benefits
in the realm of cache utilization and memory locality for NUMA systems. This
works great when all running threads do useful work all the time. Not all
threads will receive equal amounts of CPU time.
SCHED_BSD on the other hand much more aggressively reschedules running threads
on other CPUs with no regard for cache locality. This gives all threads an
equal share of CPU time.
I was unable to (quickly) find out the exact implementation details of
multithreaded ffmpeg, but I guess it simply splits up each frame in N equal
parts and uses N threads to encode these parts. The master process then
probably recombines them into the final frame. Because almost all encodings
compress video using the 3rd dimension (time) it must wait for the current
frame to finish before it can start encoding the next frame. Thus we end up
with a workload like this:
split1 -> N x encode1 -> recombine1 -> split2 -> N x encode2 -> recombine2
etc..
This bursty behavior really is no good match for ULE, because it (ffmpeg's
master process) assumes equal runtime of all threads. ULE has no time to
properly load balance all threads before they die (or stop doing work). You
can see clearly in your timings that SCHED_ULE is actually just as fast when
we look at the amount of CPU time spent. However, because it is not nearly as
aggressive as SCHED_BSD in stealing threads from busy CPUs there is some idle
time in there as well. This causes the difference. There are some tunable
sysctls (kern.sched.*) that might help in this scenario.
I bet if you would run two ffmpeg processes in parallel you'd get about the
same runtimes for both schedulers.
(Disclaimer: I have collected most of this information from the mailing
lists, not the actual code, so I could be completely wrong)
--
Pieter de Goeje
More information about the freebsd-performance
mailing list