Motion Interpolation for Glitch Aesthetics using FFmpeg part 0

As you may have seen in this blog post I made use of FFmpeg’s minterpolate motion interpolation options to make all of the faces morph. There’s quite a few options for minterpolate and many different combinations of options that can be used. i had to consult Wikipedia to figure out exactly what the different motion estimation algorithms were but even with that information I couldn’t visualise how it would change the output. To add to this how I’m using minterpolate isn’t a typical use case.

To make things easier for those wishing to use FFmpeg’s minterpolate to create glitch aesthetics I have compiled 36 videos each showing a different combination of processing options. The source video can be seen below and features two of my favourite things: cats (obtained from here) and rainbows.

I’ve slowed it down so that you can see exactly what’s in the video, but the original can be downloaded here.

The base script used for each video is:

ffmpeg -i cat_rainbow_original.mp4 -filter:v "setpts=62.5*PTS,minterpolate='fps=25:mb_size=16:search_param=400:vsbmc=0:scd=none:

In part two of March’s Development Update I explained why I set scd to none and search_param to 400. I could have of course documented what happens when all of the processing options are changed but that would result in me having to make hundreds of videos! The options that were changed were the mc_mode (motion compensation mode), me_mode (motion estimation mode), and me (motion estimation algorithm).

Test conditions

These videos were created using FFmpeg 7:4.1.4-1build2, installed from the Ubuntu repositories, on a Dell XPS 15 (2017 edition) with 16GB memory, a i7 processor and an Nvidia GeForce GTK 1050 graphics card, all running on Ubuntu 19.10 using proprietary drivers.

I don’t have a Windows or Mac machine, and haven’t used other Linux distributions so can’t test these scripts in those conditions. If there’s any problems with getting FFmpeg on your machine it’s best that you contact the developers for assistance.

Observations

My first observation is that the esa me_mode takes frikkin ages to complete! Each video using this me_mode took about four hours to process. I did consider killing the script but for completeness I let it run.

Using bilat me_mode produces the most chaotic results by far. Just compare 026_mc_mode=obmc_me_mode=bilat_me=epzs.mp4 to 008_mc_mode=obmc_me_mode=bidir_me=epzs.mp4 and you’ll see what I mean.

For a video of this length nearly all of the scripts (except for those using esa) took between 30 seconds and 1 minute to complete, and that’s on machines with and without a GPU. This is good news if you don’t want to have to carry around a powerhouse laptop all the time.

All of this reminds me a bit of datamoshing. It’s more predictable and controllable, but the noise and melty movement it creates, especially some of the ones using bilat me_mode, remind me of the bloom effect in datamoshing. This could be down to the source material, and I’d be interested to see experiments involving datamoshed videos.

Let’s a go!

With that all said let’s jump into sharing the results. As there’s 36 videos I’ll be splitting it over nine blog posts over nine days, with the last being posted on 28th March 2020. Each will contain the script I used as well as the output video. Links to each part can be found below:

(mis)Using FFmpeg’s Motion Interpolation Options

Towards the end of the Let’s Never Meet video the robotic faces slowly morph into something a little bit more human-like.

These faces continue to morph between lots of different faces, suggesting that when getting to know people you can never really settle on who they are. To make the faces morph I used motion interpolation to morph between each face. Here’s what Wikipedia has to say about motion interpolation.

Motion interpolation or motion-compensated frame interpolation (MCFI) is a form of video processing in which intermediate animation frames are generated between existing ones by means of interpolation, in an attempt to make animation more fluid, to compensate for display motion blur, and for fake slow motion effects.

For those that use proprietary software there’s a few that can do this, including Twixtor and After Effects.

If, like me, you only use open source software there are a few options but they’re not integrated within a general post processing or video editing GUI.

slowmoVideo

slowmoVideo is an open source application which allows you to vary the speed of a video clip over time. I used this previously for the background images in the Visually Similar Artwork.

For Let’s Never Meet I did consider using slomoVideo again. What I like about it is being able to vary the speed and that it has a GUI. However, development on it seems kinda slow and, most importantly, it requires a GPU. Occasionally I find myself working on a machine that only has integrated graphics (i.e. no GPU), which makes using slomoVideo practically impractical. So, I needed something that would reliably work on a CPU and produced similar if not same visual results as slomoVideo.

Butterflow

Butterflow is another software for motion interpolation. It doesn’t have a native GUI but it has a nice set of command line options. Sadly it seems impossible to install on Linux. Many have tried, many have failed.

FFmpeg

Finally I tried FFmpeg. Pretty much all my artworks use FFmpeg at some point, whether as the final stage in compiling a Blender render or as the backend to a video editor or video converter. I’m already very familiar with how FFmpeg works and feel it can be relied to work an be developed in the future.

I actually first came across FFmpeg’s motion interpolation options sometime in late 2018, but only really cemented my understanding of how to use it in making Let’s Never Meet.

Going through FFmpeg’s minterpolate options was quite daunting at first. There’s lots of options which have descriptions on how they work but I didn’t really understand what results they would produce. Nonetheless I mixed and matched settings until I produced something close to my liking.

The first step in making the morphed video was making original speed video.

I’ve slowed the above video down so you can see each frame, but if you want the original video you can download it here. This consisted 47 faces/images, played one image per frame. In total it lasted 1.88 seconds and I needed to slow it down to at least x minutes, which is the length of the video.

Here is the code that I used

ffmpeg -i lnm_faces_original.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=400'" -y output.mp4

I’ll explain three of the important parts of this code.

setpts

The FFmpeg wiki has a good explanation of what setpts does:

To double the speed of the video, you can use:

ffmpeg -i input.mkv -filter:v "setpts=0.5*PTS" output.mkv

The filter works by changing the presentation timestamp (PTS) of each video frame. For example, if there are two successive frames shown at timestamps 1 and 2, and you want to speed up the video, those timestamps need to become 0.5 and 1, respectively. Thus, we have to multiply them by 0.5.

So, by using setpts=40*PTS I’m essentially slowing the video down by a factor of 40. For this video I took a guess at much I’d need to multiply the video of the faces to make it match the length of the video. If I wanted to be exact I’d need to use some maths and divide the frame count of the video (5268), divide it by the frame count of the face video (47) and use the output (112.085106383) as the PTS multiplier.

scd

scd is probably the most important part of this code. It attempts to detect if there’s any scene changes and then not perform any motion interpolation on those frames. In this scenario, however, I want to interpolate between every frame, regardless of whether they appear to be part of the same “scene”. If you leave scd at the default of fdiff and scd_threshold at 5.0 ffmpeg tries to decide if there’s enough difference between frames. Here’s what that would’ve looked like:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:me_mode=bidir:vsbmc=1:search_param=400'" -y lnm_faces_scd.mp4
(without setting scd the defaults are assumed)

Not ideal, so I disabled it by setting it to none.

search_param

This one I don’t quite understand but I understand how it affects the video. If I were to leave the setting with the default value of 32 then you can see that when it interpolates there isn’t much movement:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=32'" -y search_param_32.mp4

With the value of 400 which I used:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=400'" -y search_param_400.mp4

And with the slightly ridiculous value of 2000:


ffmpeg -i faces.mp4 -filter:v "setpts=40*PTS,minterpolate='fps=25:scd=none:me_mode=bidir:vsbmc=1:search_param=2000'" -y search_param_2000.mp4

The biggest difference is clearly between setting the search_param from 32 to 400. At 2000 there’s only minor differences, though this may change depending on your source input.

It’s morphin’ time!

With all the settings of minteroplate now set I created the final video:


(I reduced the quality of the video a little bit to save on bandwidth)

I quite like the end results. It doesn’t look the same as the output of slowmoVideo in that it the morphing happens in blocks and doesn’t look like the dust grains output of slomoVideo. However, in using FFmpeg I can now use a familiar program that works on the CPU, even if it does take a long time!