What is the Constant Rate Factor?
The Constant Rate Factor (CRF) is the default quality (and rate control) setting for the x264 and x265 encoders. You can set the values between 0 and 51, where lower values would result in better quality, at the expense of higher file sizes. Higher values mean more compression, but at some point you will notice the quality degradation.
For x264, sane values are between 18 and 28. The default is 23, so you can use this as a starting point.
ffmpeg, it’d look like this:
ffmpeg -i input.mp4 -c:v libx264 -crf 23 output.mp4
For x265, the default CRF is 28:
ffmpeg -i input.mp4 -c:v libx265 -crf 28 output.mp4
If you’re unsure about what CRF to use, begin with the default and change it according to your subjective impression of the output. Is the quality good enough? No? Then set a lower CRF. Is the file size too high? Choose a higher CRF. A change of ±6 should result in about half/double the file size, although your results might vary.
CRF versus Constant QP
CRF is a “constant quality” encoding mode, as opposed to constant bitrate (CBR). Typically you would achieve constant quality by compressing every frame of the same type the same amount, that is, throwing away the same (relative) amount of information. In tech terminology, you maintain a constant QP (quantization parameter). The quantization parameter defines how much information to discard from a given block of pixels (a Macroblock). This typically leads to a hugely varying bitrate over the entire sequence.
Constant Rate Factor is a little more sophisticated than that. It will compress different frames by different amounts, thus varying the QP as necessary to maintain a certain level of perceived quality. It does this by taking motion into account. A constant QP encode at QP=18 will stay at QP=18 regardless of the frame (there is some small offset for different frame types, but it is negligible here). Constant Rate Factor at CRF=18 will increase the QP to, say, 20, for high motion frames (compressing them more) and lower it down to 16 for low motion parts of the sequence. This will essentially change the bitrate allocation over time.
For example, here is a figure (from another post of mine) that shows how the bitrate changes for two video clips encoded at different levels (17, 23) of constant QP or CRF:
The line for CRF is always lower than the line for CQP; it means that the encoder can save bits, while retaining perceptual quality, whereas with CQP, you waste space. This effect is quite pronounced in the first video clip, for example.
Why is motion so important?
The human eye perceives more detail in still objects than when they’re in motion. Because of this, a video compressor can apply more compression (drop more detail) when things are moving, and apply less compression (retain more detail) when things are still. In layperson’s terms, this is because your visual system will be “distracted” by everything going on, and won’t have the image on screen for enough time to see the heavier compression. Slightly more technically speaking, high motion “masks” the presence of compression artifacts like blocking. On the other hand, when a frame doesn’t have a lot of motion, you will (simply put) have more time to look at the image, and there will be nothing to distract you or mask any artifacts, so you want the frame to be as little compressed as possible. With low motion, compression artifacts become more salient and thus more distracting.
You may ask if constant QP isn’t really better quality in the end? No, the perceived quality is the same, but essentially it wastes space by compressing less in areas you really won’t notice.
If you had only simple ways at hand to compare the quality of video sequences (e.g., based on a per-frame measurement of signal to noise ratio, PSNR), you may look at a CRF encoding and say it was lower quality than the CQP variant. But if you’re a human being, subjectively, the CRF copy will look better. It least compresses the parts where you see details the most, and most compresses the parts where you see details the least. That means that while the average quality as objectively gauged by PSNR goes slightly down, the perceptible video quality goes up. This is also another argument against using PSNR to judge video quality: it cannot take into account perceptual effects like motion, since it only looks at individual frames.
Practically speaking, many people always use CRF for single-pass encodes and argue there is no reason to ever use CQP. Another good argument for using CRF is that it is the default rate control mode chosen for x264 and x265.
How do quality and bitrate relate to each other?
Not all video clips are equally “easy” to compress. Low motion and smooth gradients are easy to compress, whereas high motion and lots of spatial details are more demanding on an encoder. When I say “easy” or “hard”, I mean that an easy source will have better perceptual quality at the same bitrate than a source that is hard to encode. CRF takes care of this problem: with different videos, different CRF levels result in different bitrates. (In fact, you cannot reliably estimate what the resulting bitrate for a given CRF will be, unless you know more about that source.)
For example, if you set CRF 23, you may end up with 1,500 kBit/s for one source, but only 1,000 kBit/s for the other. They should look the same in terms of quality, though. With CRF, you are saying “use whatever bitrate is necessary to preserve this much detail.” It’s not a 1-to-1 thing.
Note that if your CRF is too high—for example if you use a CRF of 30—you’re going to see blocking on high-motion because the bitrate in these parts will simply be too low. The encoder will use a QP of (for example) 32 for the more complex parts, which is way too heavy a quantizer. As mentioned in the beginning, choose the CRF depending on what level of quality you want.
Why do you still see some blocky stuff on TV or cable?
Why are there blocky cable or satellite broadcasts? Or even online video streams? The problem is that they are using a too low bitrate for some parts of the video. Especially in broadcasting, streaming is done at a constant bitrate, which does not allow for variations to adapt to the level of motion. Therefore, those TV broadcasts get blocky because the complex things they’re displaying require more bits than the broadcaster has chosen to give them. They only say “preserve as much detail as you can while never going above this high a bitrate no matter how complicated things get.”
Streaming nowadays is done a little more cleverly. YouTube or Netflix are using 2-pass or even 3-pass algorithms, where in the latter, a CRF encode for a given source determines the best bitrate at which to 2-pass encode your stream. They can make sure that enough bitrate is reserved for complex scenes while not exceeding your bandwidth.
You can learn more about rate control modes in another post of mine.