Extracting DVD Captions: Home

Tips on Extracting Closed Captions from DVD programs

Subtitles vs. Captions

Extracting Captions from DVDs

This research guide describes the process of extracting captions, and not subtitles, from commercial DVDs for the purposes of creating a streamable video file. Theoretically, subtitles transcribe the text of spoken dialog within a program, while captions transcribe spoken dialog and also describe important background noises. In practice, this distinction is often blurred. On commercial DVDs, the term captions usually refers to data on a reserved channel within a video stream, while the term subtitles refers to separate video streams that can be overlaid on top of the main video stream. Because captions and subtitles are stored differently on a DVD, different software and procedures are needed to ensure that they are retained when a DVD is processed into a streamable file. Most programs that can rip DVD programs into computer files can include subtitles just be clicking a menu choice, so that process is not described here. Most programs that can rip DVD programs into computer files do NOT carry captions forward into the output, so the steps needed to extract and remix the captions is described below.

Many older DVDs, especially if they contain programs that originated as television broadcasts, have captions (the reserved channel within the video program) and no subtitles. Newer DVDs tend to have only subtitles (separate video streams) and no captions. Some DVDs have both, and in such cases both must be inspected to determine which is to be retained in the streamable video file. Of course, many DVDs have neither.

Video Files on a DVD

On commercial DVDs, video programs are always located in a folder named VIDEO_TS. One or more video objects will reside within that folder. Each video object is a file with the extension .VOB, containing all or part of a video program together with any related audio, subtitle, and caption data. The maximum size of a .VOB file is roughly 1GB, so longer video programs require multiple .VOB files. The .VOB files making up a single video program are numbered sequentially, and seeing the program correctly requires playing the .VOB files in numerical order. 

A DVD often contains multiple programs, such as multiple episodes in a series or a main movie and companion programs. Each separate program on a DVD is called a title set,  and each title set is identified by a number. While title sets are often numbered sequentially, this is a matter of practice rather than requirements, so each title set number is actually an arbitrary identifier.

The name of each .VOB file reflects the identifying number of the title set and the position of the .VOB file in that title set. A DVD might, for example, contain files such as these:



The 'VTS" in these file names stands for Video Title Set, and is followed by the identifying number of the title set. The files depicted here include three title sets, numbered 1, 2, and 3. Title set 1 is comprised of three .VOB files, numbered 0 through 2, title set 2 is comprised of a single .VOB file, and title set 3 is comprised of two .VOB files numbered 1 and 2. 

Processing DVD Captions - Overview

Processing captions requires the following technical steps:

  1. Identify the DVD title number containing the desired program.
  2. Decrypt the .VOB files (if encrypted).
  3. Extract the captions from the DVD into a timed-text file.
  4. Rip the Video program from the DVD, specifying as input the timed-text file created in the preceding step.
  5. Verify that the output video contains the correct captions and that they are in sequence with the video.

Each of these steps is described in more detail below.

Step 1: Identify the Title Set Number of the Desired Program

Typically, a DVD is processed to extract one video program, with its captions, out of several that exist on a DVD. This requires identifying which title set, out of two or more, contains the desired program. This is usually not difficult, but is not always straightforward. The only real problem is that the title set numbers on the DVD don't correspond in any direct way to human-readable titles such as "Gone with the Wind" or "Episode 3." 

The simplest way to determine which title set number to process is often to explore the DVD with a ripping program such as DVDFab or MacX DVD Ripper, or by playing the video with a media player such as VLC. Options in such program can reveal which title number is being played, and can also reveal details such as program length, subtitle languages, and types of captions. 

Other approaches are sometimes useful as well. With feature films, the longest program on the DVD is usually the one targeted for processing.  A visual inspection of the number of .VOB files within each title set often shows that one program on the disc is much longer than the others. A program such as IFOedit can display the contents of information (.IFO) files on a DVD to reveal details such as program length and subtitle languages. 

Step 2: Decrypt the DVD program

Video programs on commercial DVDs often contain a distortion signal that is included to make illicit copying more difficult. Although easily filtered out by hardware and software players, the distortion signal can make it difficult to extract captions. If playing an individual .VOB file with a player like VLC produces a visually distorted program, or it seems impossible to extract captions from a program known to have them, the video program likely contains such a distortion signal.

To extract captions from a video program containing a distortion signal, it is necessary to decrypt each of the .VOB files making up the program. This is done with a decryption program such as DVD Decrypter. The decryption program produces a separate copy of each .VOB file, and it is those decrypted copies that must be input to the next step, described below.

Step 3: Run Closed Caption Extraction Program

The multi-platform CCExtractor program can be used to create a timed-text file from captions on a DVD. The program provides a wide array of options, only a few of which are used under most circumstances. After starting the program, complete these steps:

  • Click on the Input files tab, and paste in the names of the desired .VOB files. The files must be listed from top to bottom in correct, numerical order.
  • Click on the Input options tab. Make sure that the button labeled "Tese files are part of the same video. They were cut by a generic tool" is selected in the "Split type" section. 
  • Click on the Output(1) tab. Specify the output type, which for most purposes should be a SubRip file with extension .SRT.  Click the radio button labeled "Save the output in this file" to specify the output folder and file name. 
  • Click on the Execution tab and then the start button. Caption text should scroll down the preview box while extraction is in progress

When this process is completed, a valid subtitle file should be found in the specified folder.

A few notes on this process:

  • Generally, the first .VOB file in the list of input files should be the file with sequence number 1. If the title set includes a .VOB file with sequence number zero, that is likely an on-screen menu file and including it in the list of input files to process will result in subtitles that are out of sync with the video program. 
  • By default, CCExtractor assumes that captions in the .VOB files adhere to a standard known as CEA-608. This is true of most captions on most DVDs. Some DVDs might have captions implemented using a later standard known as CEA-708. CCExtractor has an option for handling these, located in the Decoders tab. Programs like IFOEdit and VLC Media Player can report on the format of captions within a DVD program.
  • Subtitles in the SubRIP (.SRT) format are likely to be the most compatible with whatever software is used in the next (ripping) step. If a different format is needed, CCExtractor likely has an option for the required type in the Output(1) tab.

Step 4: Rip the Video Program With the Extracted Captions

Rip the Video Program together with the Timed Text

Using a DVD ripping program such as DVDFab, Handbrake, or MacX DVD Ripper, rip the desired program from the original DVD, specifying as input the timed-text file created above. Details vary from program to program, but all allow an external subtitles file to be "burned in" or "directly rendered" to the streamable video file.

Step 5. Verify Results

Play the streamable video file with a media player such as VLC. The program should display clearly, and the subtitles should be displayed as a permanent part of the program. The text of the subtitles should be in sync with the audio dialog, and should remain in sync throughout the program.

Editing the Subtitles

Assuming the DVD captions were extracted into one or another type of timed-text file, the file can easily be edited for details such as spelling, word spacing, and punctuation. The most common timed-text formats are SubRip (.SRT) and WebVTT (.VTT) files, and any text editing program can be used to adjust their contents. There are also programs specifically designed for editing subtitles, some of which can assist with adjusting time stamps, where subtitles are out of sync with the audio dialog.

Minor editing of the subtitles file may sometimes be necessary and achievable, but extensive editing of a large subtitles file can be difficult and time-consuming. If the quality of the captions extracted from a DVD is poor, it is often more efficient to send the video program out for professional captioning than it is to try to fix a bad set of captions.