(mis)Using FFmpeg’s Motion Interpolation Options

Towards the end of the Let’s Never Meet video the robotic faces slowly morph into something a little bit more human-like.

These faces continue to morph between lots of different faces, suggesting that when getting to know people you can never really settle on who they are. To make the faces morph I used motion interpolation to morph between each face. Here’s what Wikipedia has to say about motion interpolation.

Motion interpolation or motion-compensated frame interpolation (MCFI) is a form of video processing in which intermediate animation frames are generated between existing ones by means of interpolation, in an attempt to make animation more fluid, to compensate for display motion blur, and for fake slow motion effects.

For those that use proprietary software there’s a few that can do this, including Twixtor and After Effects.

If, like me, you only use open source software there are a few options but they’re not integrated within a general post processing or video editing GUI.

slowmoVideo

slowmoVideo is an open source application which allows you to vary the speed of a video clip over time. I used this previously for the background images in the Visually Similar Artwork.

For Let’s Never Meet I did consider using slomoVideo again. What I like about it is being able to vary the speed and that it has a GUI. However, development on it seems kinda slow and, most importantly, it requires a GPU. Occasionally I find myself working on a machine that only has integrated graphics (i.e. no GPU), which makes using slomoVideo practically impractical. So, I needed something that would reliably work on a CPU and produced similar if not same visual results as slomoVideo.

Butterflow

Butterflow is another software for motion interpolation. It doesn’t have a native GUI but it has a nice set of command line options. Sadly it seems impossible to install on Linux. Many have tried, many have failed.

FFmpeg

Finally I tried FFmpeg. Pretty much all my artworks use FFmpeg at some point, whether as the final stage in compiling a Blender render or as the backend to a video editor or video converter. I’m already very familiar with how FFmpeg works and feel it can be relied to work an be developed in the future.

I actually first came across FFmpeg’s motion interpolation options sometime in late 2018, but only really cemented my understanding of how to use it in making Let’s Never Meet.

Going through FFmpeg’s minterpolate options was quite daunting at first. There’s lots of options which have descriptions on how they work but I didn’t really understand what results they would produce. Nonetheless I mixed and matched settings until I produced something close to my liking.

The first step in making the morphed video was making original speed video.

I’ve slowed the above video down so you can see each frame, but if you want the original video you can download it here. This consisted 47 faces/images, played one image per frame. In total it lasted 1.88 seconds and I needed to slow it down to at least x minutes, which is the length of the video.

Here is the code that I used

ffmpeg -i lnm_faces_original.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=400'" -y output.mp4

I’ll explain three of the important parts of this code.

setpts

The FFmpeg wiki has a good explanation of what setpts does:

To double the speed of the video, you can use:

ffmpeg -i input.mkv -filter:v "setpts=0.5*PTS" output.mkv

The filter works by changing the presentation timestamp (PTS) of each video frame. For example, if there are two successive frames shown at timestamps 1 and 2, and you want to speed up the video, those timestamps need to become 0.5 and 1, respectively. Thus, we have to multiply them by 0.5.

So, by using setpts=40*PTS I’m essentially slowing the video down by a factor of 40. For this video I took a guess at much I’d need to multiply the video of the faces to make it match the length of the video. If I wanted to be exact I’d need to use some maths and divide the frame count of the video (5268), divide it by the frame count of the face video (47) and use the output (112.085106383) as the PTS multiplier.

scd

scd is probably the most important part of this code. It attempts to detect if there’s any scene changes and then not perform any motion interpolation on those frames. In this scenario, however, I want to interpolate between every frame, regardless of whether they appear to be part of the same “scene”. If you leave scd at the default of fdiff and scd_threshold at 5.0 ffmpeg tries to decide if there’s enough difference between frames. Here’s what that would’ve looked like:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:me_mode=bidir:vsbmc=1:search_param=400'" -y lnm_faces_scd.mp4
(without setting scd the defaults are assumed)

Not ideal, so I disabled it by setting it to none.

search_param

This one I don’t quite understand but I understand how it affects the video. If I were to leave the setting with the default value of 32 then you can see that when it interpolates there isn’t much movement:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=32'" -y search_param_32.mp4

With the value of 400 which I used:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=400'" -y search_param_400.mp4

And with the slightly ridiculous value of 2000:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=2000'" -y search_param_2000.mp4

The biggest difference is clearly between setting the search_param from 32 to 400. At 2000 there’s only minor differences, though this may change depending on your source input.

It’s morphin’ time!

With all the settings of minteroplate now set I created the final video:


(I reduced the quality of the video a little bit to save on bandwidth)

I quite like the end results. It doesn’t look the same as the output of slowmoVideo in that it the morphing happens in blocks and doesn’t look like the dust grains output of slomoVideo. However, in using FFmpeg I can now use a familiar program that works on the CPU, even if it does take a long time!