Merge video, audio, images into one video

🤖/video/merge composes a new video by adding an audio track to existing still image(s) or video.

Parameters

output_meta
Record<string, boolean> | boolean
Allows you to specify a set of metadata that is more expensive on CPU power to calculate, and thus is disabled by default to keep your Assemblies processing fast.

For images, you can add "has_transparency": true in this object to extract if the image contains transparent parts and "dominant_colors": true to extract an array of hexadecimal color codes from the image.

For videos, you can add the "colorspace: true" parameter to extract the colorspace of the output video.

For audio, you can add "mean_volume": true to get a single value representing the mean average volume of the audio file.

You can also set this to false to skip metadata extraction and speed up transcoding.
result
boolean (default: false)
Whether the results of this Step should be present in the Assembly Status JSON
queue
"batch"
Setting the queue to 'batch', manually downgrades the priority of jobs for this step to avoid consuming Priority job slots for jobs that don't need zero queue waiting times
force_accept
boolean (default: false)
Force a Robot to accept a file type it would have ignored.

By default, Robots ignore files they are not familiar with. 🤖/video/encode, for example, will happily ignore input images.

With the force_accept parameter set to true, you can force Robots to accept all files thrown at them. This will typically lead to errors and should only be used for debugging or combatting edge cases.
use
string | Array<string> | Array<object> | object
Specifies which Step(s) to use as input.
- You can pick any names for Steps except ":original" (reserved for user uploads handled by Transloadit)
- You can provide several Steps as input with arrays:
```
{
  "use": [
    ":original",
    "encoded",
    "resized"
  ]
}
```
Tip

That’s likely all you need to know about use, but you can view Advanced use cases.
ffmpeg
object
A parameter object to be passed to FFmpeg. If a preset is used, the options specified are merged on top of the ones from the preset. For available options, see the FFmpeg documentation. Options specified here take precedence over the preset options.
ffmpeg_stack
"v5" | "v6" | "v7" | string (default: "v5.0.0")
Selects the FFmpeg stack version to use for encoding. These versions reflect real FFmpeg versions. We currently recommend to use "v6.0.0".
width
string | number | null
Width of the new video, in pixels.

If the value is not specified and the preset parameter is available, the preset's supplied width will be implemented.
height
string | number | null
Height of the new video, in pixels.

If the value is not specified and the preset parameter is available, the preset's supplied height will be implemented.
preset
"android" | "android-high" | "android-low" | "android_high" | "android_low" | "dash-1080p-video" | "dash-1080p_video" |
Converts a video according to pre-configured settings.

If you specify your own FFmpeg parameters using the Robot's and/or do not not want Transloadit to set any encoding setting, starting ffmpeg_stack: "v6", you can use the value 'empty' here.
resize_strategy
"crop" | "fit" | "fillcrop" | "min_fit" | "pad" | "stretch" (default: "pad")
If the given width/height parameters are bigger than the input image's dimensions, then the resize_strategy determines how the image will be resized to match the provided width/height. See the available resize strategies.
background
string (default: "#00000000")
The background color of the resulting video the "rrggbbaa" format (red, green, blue, alpha) when used with the "pad" resize strategy. The default color is black.
framerate
string | number | string (default: "1/5")
When merging images to generate a video this is the input framerate. A value of "1/5" means each image is given 5 seconds before the next frame appears (the inverse of a framerate of "5"). Likewise for "1/10", "1/20", etc. A value of "5" means there are 5 frames per second.
image_durations
Array<string | number> (default: [])
When merging images to generate a video this allows you to define how long (in seconds) each image will be shown inside of the video. So if you pass 3 images and define [2.4, 5.6, 9] the first image will be shown for 2.4s, the second image for 5.6s and the last one for 9s. The duration parameter will automatically be set to the sum of the image_durations, so 17 in our example. It can still be overwritten, though, in which case the last image will be shown until the defined duration is reached.
duration
string | number (default: 5)
When merging images to generate a video or when merging audio and video this is the desired target duration in seconds. The float value can take one decimal digit. If you want all images to be displayed exactly once, then you can set the duration according to this formula: duration = numberOfImages / framerate. This also works for the inverse framerate values like 1/5.

If you set this value to null (default), then the duration of the input audio file will be used when merging images with an audio file.

When merging audio files and video files, the duration of the longest video or audio file is used by default.
audio_delay
string | number (default: 0)
When merging a video and an audio file, and when merging images and an audio file to generate a video, this is the desired delay in seconds for the audio file to start playing. Imagine you merge a video file without sound and an audio file, but you wish the audio to start playing after 5 seconds and not immediately, then this is the parameter to use.
loop
boolean (default: false)
Determines whether the shorter media file should be looped to match the duration of the longer one. For example, if you merge a 1-minute video with a 3-minute audio file and enable this option, the video will play three times in a row to match the audio length.
replace_audio
boolean (default: false)
Determines whether the audio of the video should be replaced with a provided audio file.
vstack
boolean (default: false)
Stacks the input media vertically. All streams need to have the same pixel format and width - so consider using a /video/encode Step before using this parameter to enforce this.

Demos

Introducing video merge Robot: image & audio to video August 7, 2013
A happy 2014 from Transloadit! January 14, 2014
On upgrades & goodbyes August 8, 2014
Kicking Transloadit into gear for the new year February 1, 2015
Enhancing FFmpeg for superior encoding performance July 30, 2015
Happy 2016 from Transloadit December 31, 2015
New pricing model for future Transloadit customers February 7, 2018
Mastering audio sync with Transloadit's audio delay March 12, 2019
Tutorial: using /video/merge to develop video slideshows June 14, 2019
No-code real-time video uploading with Bubble & Transloadit August 2, 2019
Let's Build: video from album art with Transloadit October 10, 2021
Automatically generate music previews from Spotify November 16, 2021
Build a Reddit video subtitling bot with Transloadit February 10, 2022
Let's Build: music card generator with Transloadit May 5, 2022
Creating engaging audio visualizations with Transloadit April 2, 2023

Merge video, audio, images into one video

Parameters

`output_meta`

`result`

`queue`

`force_accept`

`use`

Tip

`ffmpeg`

`ffmpeg_stack`

`width`

`height`

`preset`

`resize_strategy`

`background`

`framerate`

`image_durations`

`duration`

`audio_delay`

`loop`

`replace_audio`

`vstack`

Demos

Merge video, audio, images into one video

Parameters

output_meta

result

queue

force_accept

use

Tip

ffmpeg

ffmpeg_stack

width

height

preset

resize_strategy

background

framerate

image_durations

duration

audio_delay

loop

replace_audio

vstack

Demos

Related blog posts

`output_meta`

`result`

`queue`

`force_accept`

`use`

`ffmpeg`

`ffmpeg_stack`

`width`

`height`

`preset`

`resize_strategy`

`background`

`framerate`

`image_durations`

`duration`

`audio_delay`

`loop`

`replace_audio`

`vstack`