Using mkvmerge to extract and merge subtitles

Published on: - 2 years, 8 months ago

An article tagged as: linux, mkvmerge

Twitter Google+ Facebook Reddit

Overview

I have a video in the MKV format. The video is in English, but there are some foreign language parts. I want subtitles to appear when any foreign language is being spoken.

Here are a few well known examples:-

Media Language
The Lord of the Rings Elvish
Avatar Navi
Game of Thrones Dothraki

Subtitles can come in a variety of different formats:-

Format Description
SRT Subtitles stored as text
PGS Subtitles stored at graphics, commonly found in Blu-Rays
VobSub Subtitles stored at graphics, commonly found in DVDs

Subtitles in the SRT format are the easiest to work with as, being text files, they can be modified in a text-editor.

Subtitles can be implemented is various ways:-

Implementation Description
Burned/embedded The subtitles are burned onto the actual video frames - these cannot be modified or removed
Soft The subtitles are in the media file as a separate track - these can be turned on/off, removed or modified
Separate The subtitles are in a separate file - these can be turned on/off, removed or modified

In this article I will be showing you how to extract soft subtitles into a separate file, for editing. And also how to merge separate subtitles into a media file, thereby turning them into soft subtitles.

Definitions

Here are a few definitions to help you read this article

MKV

MKV is a media container created by Matroska. An MKV file can contain tracks of audio, video, subtitles and other information like chapter markers. There are lots of media containers available but MKV is a popular one, and this articles only deals with MKV files.

mkvmerge

mkvmerge is a tool that allows you to extract, merge and update media tracks within an MKV file.

Forced subtitles

Forced subtitles are subtitles that appear in a movie when a language is being spoken that is not the same as the audio language. For example, if you watch a movie in English but suddenly a character starts speaking German.

Kodi / XBMC

Kodi is a very popular media player, which used to be called XBMC. There are many media players out there, but this one has a useful feature where a video file containing a forced subtitle will automatically display that subtitle. This saves a lot of messing around, and means you don't have to remember to enable the subtitle for the movie you are watching. Not all media players respect forced subtitles, which is one of the reasons I like Kodi.

Merge a subtitle file

I have a movie called Iron Man that is in English, but in several places Arabic is spoken. I have downloaded the forced English subtitles for the movie in the SRT format. Now I want to merge the forced subtitles into the movie and make them play automatically.

So, I have the following 2 files - Iron Man.mkv and forced.srt

I can use the following command to merge the two together:-

mkvmerge -o output.mkv -S "Iron Man.mkv" --language "0:eng" --track-name "0:Forced" --forced-track "0:yes" --default-track "0:yes" "forced.srt"

Here is a breakdown of this mkvmerge command

Option Description
-o Outputs to the file specified
-S Exclude all existing subtitles
  Outputs to the file specified
--language Subtitle language
--track-name Name of subtitle
--forced-track Flag to let media players know that this is a forced subtitle
--default-track Flag to let media players know that this is the default subtitle
  Full name of the subtitle file to be used

This may take a few minutes to run. When complete, output.mkv with be a merge of Iron Man.mkv and forced.srt and the file will not contain any other subtitles.

To check that this has worked, we can use mkvmerge to show us information about a media file, however the output is rather messy:-

mkvmerge -I output.mkv

I prefer to use mediainfo:-

mediainfo "output.mkv"

This displays the following:-

General
Complete name                            : output.mkv
Format                                   : Matroska
Format version                           : Version 4 / Version 2
File size                                : 7.95 GiB
Duration                                 : 2h 6mn

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4.0
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 5 frames
Codec ID                                 : V_MPEG4/ISO/AVC
Duration                                 : 2h 6mn
Bit rate                                 : 8 385 Kbps
Width                                    : 1 920 pixels
Height                                   : 800 pixels
Display aspect ratio                     : 2.40:1
Frame rate                               : 23.976 fps
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.228
Language                                 : English
Default                                  : Yes
Forced                                   : No

Audio
ID                                       : 2
Format                                   : AC-3
Format/Info                              : Audio Coding 3
Mode extension                           : CM (complete main)
Codec ID                                 : A_AC3
Duration                                 : 2h 6mn
Bit rate mode                            : Constant
Bit rate                                 : 640 Kbps
Channel(s)                               : 6 channels
Channel positions                        : Front: L C R, Side: L R, LFE
Sampling rate                            : 48.0 KHz
Bit depth                                : 16 bits
Compression mode                         : Lossy
Stream size                              : 577 MiB (7%)
Title                                    : AC3 640 Kbps
Language                                 : English
Default                                  : Yes
Forced                                   : No

Text #1
ID                                       : 3
Format                                   : UTF-8
Codec ID                                 : S_TEXT/UTF8
Codec ID/Info                            : UTF-8 Plain Text
Title                                    : Forced
Language                                 : English
Default                                  : Yes
Forced                                   : Yes

You will see that the last track is our subtitle track. This is now a soft subtitle and is flagged as the default and forced. XBMC will display this automatically.

Extract subtitles

I have a movie called Ip Man. The audio is in Chinese. It contains several subtitle tracks in different languages. The English subtitle is not set as the default subtitle, nor is it set as the forced one.

Now, by definition, the English subtitle can never officially be called the forced subtitle, that is, if the audio language is Chinese, then a forced subtitle would have to be in Chinese and contain subtitles for any non-Chinese spoken parts. However, for my purposes, I want the English subtitle to always display, so I am going to flag it as default and forced.

The difficulty is that there is no way of setting the forced flag on a soft subtitle already merged with a media file. The way around this is to extract the subtitle then merge it again, setting the default and forced flags along the way.

First, take a look at the tracks within the file

mediainfo "Ip Man.mkv"

This displays the following:-

General
Complete name                            : Ip Man.mkv
Format                                   : Matroska
Format version                           : Version 4 / Version 2
File size                                : 7.93 GiB
Duration                                 : 1h 46mn
Overall bit rate                         : 10.7 Mbps

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4.1
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 5 frames
Codec ID                                 : V_MPEG4/ISO/AVC
Duration                                 : 1h 46mn
Bit rate                                 : 9 161 Kbps
Width                                    : 1 920 pixels
Height                                   : 816 pixels
Display aspect ratio                     : 2.35:1
Frame rate                               : 24.000 fps
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.244
Stream size                              : 6.65 GiB (84%)
Language                                 : Chinese
Default                                  : Yes
Forced                                   : No

Audio
ID                                       : 2
Format                                   : DTS
Format/Info                              : Digital Theater Systems
Codec ID                                 : A_DTS
Duration                                 : 1h 46mn
Bit rate mode                            : Constant
Bit rate                                 : 1 510 Kbps
Channel(s)                               : 6 channels
Channel positions                        : Front: L C R, Side: L R, LFE
Sampling rate                            : 48.0 KHz
Bit depth                                : 16 bits
Compression mode                         : Lossy
Stream size                              : 1.12 GiB (14%)
Language                                 : Chinese
Default                                  : Yes
Forced                                   : No

Text
ID                                       : 3
Format                                   : UTF-8
Codec ID                                 : S_TEXT/UTF8
Codec ID/Info                            : UTF-8 Plain Text
Title                                    : English
Language                                 : English
Default                                  : No
Forced                                   : No

Text
ID                                       : 4
Format                                   : UTF-8
Codec ID                                 : S_TEXT/UTF8
Codec ID/Info                            : UTF-8 Plain Text
Title                                    : Chinese
Language                                 : Chinese
Default                                  : No
Forced                                   : No

Text
ID                                       : 5
Format                                   : UTF-8
Codec ID                                 : S_TEXT/UTF8
Codec ID/Info                            : UTF-8 Plain Text
Title                                    : French
Language                                 : French
Default                                  : No
Forced                                   : No

Track 3 is the English subtitle. But be careful, when listing tracks, mediainfo starts at 1, but mkvmerge starts at 0. So when using a track ID provided by mediainfo in mkvmerge you should subtract 1 from the ID number. So when I use the subtitle ID in mkvmerge I will refer to it as track 2

Next, extract the subtitle track

mkvextract tracks "Ip Man.mkv" 2:"forced.srt"

Here is a breakdown of this mkvextract command

Option Description
tracks Tells mkvextract that you are extracting tracks from the media file
  Name of the media file you want to extract from
2 ID of track to be extracted, in this case #2
  The name you want the extracted track to be called

This command might take a few minutes to run. When complete you will will see a new file called forced.srt

A subtitle is just a text file, so you can open it with a normal text editor, if you need to.

Finally, I want to merge Ip Man.mkv and forced.srt together, removing any other subtitles and setting the default and forced flags

mkvmerge -o "output.mkv" -S "Ip Man.mkv" --language "0:eng" --track-name "0:Forced" --forced-track "0:yes" --default-track "0:yes" "forced.srt"

After a few minutes I will have a file called output.mkv containing my subtitle with the flags set.

When merging subtitles into a media file it is good practice to keep any existing subtitles. This could be done:-

mkvmerge -o "output.mkv" -s 3,4 "Ip Man.mkv" --language "0:eng" --track-name "0:Forced" --forced-track "0:yes" --default-track "0:yes" "forced.srt"

Using the -s option will keep any subtitle tracks listed, in this case tracks 3 and 4

Useful settings

mkvmerge has a lot of settings that can be used, details of which are in the official documentation. But here are some common settings when tweaking movies and TV shows.

Option Description
-A Remove all audio tracks
-a Keep only the audio tracks listed
-S Remove all subtitle tracks
-s Keep only the subtitle tracks listed
-D Remove all video tracks
-d Keep only the video tracks listed