2025-10
Size predictions based on preset, CRF and the size for another CRF and (faster) preset.
Attempts to predict file size for a given preset and CRF based on the file size for another preset (typically a much faster one) and another CRF. Basically, do 1 <= n <= 4 quick or very quick encodes and determine the CRF to use with a much slower encode in order to reach a given size. That n is to be determined more precisely.
The size = f(crf) function can be approximated well enough by an exponential.
The function can be approximated better by exp(a * x³ + b * x² + c * x + d).
In 3D, size = f(crf, preset) is at least C_1, meaning it is continuous, there is a derivative everywhere and that derivative is itself continuous (it looks C_2 or more); this indicates the shape isn't random and there is an underlying structure which can probably be taken advantage of.
exp(x ** 0 * 7.367942 + x ** 1 * -0.083551 + x ** 2 * -0.000043 + x ** 3 * -0.000004)is a very good approximation, deviating by a few megs at most.
It's useful to scale file sizes based on the one obtained at e.g. preset=2
and crf=40 which are intermediate and sensible values.
Working with a logarithmic scale gives a much finer picture.
I couldn't find an approximation for the 3D curve because I haven't found any way to approximate non-trivial or symmetric 3D curves, at least not with a usable implementation. But maybe scaling everything based on one data point will work well enough.
The plot above can be re-created in gnuplot using the extrapolated data points and the following command:
splot 'predicted-relative-size-vs-crf-and-preset.dat' using 2:1:3 with linespoints palette