Web Video Text Tracks Format (WebVTT)
Baseline Widely available
This feature is well established and works across many devices and browser versions. It’s been available across browsers since July 2015.
Web Video Text Tracks Format (WebVTT) is a plain-text file format for displaying timed text tracks that are synchronized with content in <video>
and <audio>
elements.
These can be used, for example, to add closed captions and subtitle text overlays to a <video>
.
The WebVTT files associated with a media element are added using the <track>
element — see Displaying VTT content defined in a file.
A media element can be associated with a number of files, each representing different kinds of timed data, such as closed captions, subtitles, or chapter headings, translated into different locales.
Note: WebVTT content can also be created and managed programmatically using the WebVTT API.
Overview
WebVTT files have a MIME type of text/vtt
and the file extension .vtt
.
The content must be encoded using UTF-8.
The structure of a WebVTT consists of the following components, some of them optional, in this order:
- A header, consisting of an optional byte order mark (BOM) — the string
WEBVTT
— followed by an optional text header separated by one or more space or tab characters (in WebVTT files, tabs and spaces are interchangeable). - One or more blank lines, each which is equivalent to two consecutive newlines.
- Zero or more
STYLE
,REGION
, orNOTE
blocks, separated by one or more blank lines. - Zero or more cue or
NOTE
blocks, separated by one or more blank lines.
A simple WebVTT file that has the WEBVTT
string (but no header text), a NOTE block, and two cues is shown below:
WEBVTT NOTE This is a multi-line note block. These are used for comments by the author Two cue blocks are defined below. 00:01.000 --> 00:04.000 Never drink liquid nitrogen. 00:05.000 --> 00:09.000 Because: - It will perforate your stomach. - You could die.
The following sections explain the parts of the file, including those not used in the example above.
WebVTT Header
WebVTT files start with a header block containing the following:
-
An optional byte order mark (BOM), which is Unicode character
U+FEFF
. -
The string
WEBVTT
. -
An optional text header to the right of
WEBVTT
.- There must be at least one space after
WEBVTT
. - You could use this header to add a description to the file.
- You may use anything in the text header except newlines or the string
-->
.
- There must be at least one space after
The WEBVTT
string is the only required part of the WebVTT file, so the simplest possible WebVTT file would look like this:
WEBVTT
The example below shows a header with text. Note that this text must be separated by at least one space or tab.
WEBVTT This file has no cues.
WebVTT cues
A cue defines a single caption, subtitle, or other text block to be displayed over a particular time interval.
Cues must appear after the header and any STYLE
or REGION
blocks.
Each cue consists of three or more lines:
- An optional cue identifier followed by a newline.
- Cue timings that indicate the time range in which the payload text should be displayed. These are optionally followed by cue settings with at least one space before the first setting and between each setting, followed by a single newline.
- The cue payload text, which may span multiple lines, and will be terminated by an empty line.
Here is an example of a simple cue.
The first line specifies the cue's display start and end times, separated using the string -->
.
The second line defines the text to be displayed.
00:01.000 --> 00:04.000 Never drink liquid nitrogen.
The next cue is slightly more complicated.
It starts with a cue identifier — 1 - Title Crawl
— which may be used to reference the cue in JavaScript and CSS.
It also has cue settings after the cue timings to set the cue position.
1 - Title Crawl 00:05.000 --> 00:09.000 line:0 position:20% size:60% align:start Because: - It will perforate your stomach. - You could die.
Note that the output will respect line breaks in the payload text, which allows you to create bulleted lists using hyphen (-
) characters as shown.
Generally you should only insert these breaks when needed, as the browser will wrap text appropriately.
It is important to not use "extra" blank lines within a cue, for example between the timings line and the cue payload, or within the payload. This is because a blank line will end the current cue.
Each part of the cue is explained in more detail in the following sections.
Cue identifier
The identifier is a name that identifies the cue. It can be used to reference the cue from JavaScript or CSS. It must not contain a newline and cannot contain the string -->
. It must end with a single new line. Identifiers do not have to be unique, although it is common to number them (e.g., 1, 2, 3).
The example below shows a file with several cues that include identifiers:
WEBVTT 1 00:00:22.230 --> 00:00:24.606 This is the first subtitle. 2 Some Text 00:00:30.739 --> 00:00:34.074 This is the second. 3 00:00:34.159 --> 00:00:35.743 This is the third
Cue timings
A cue timing indicates the time interval when the cue is shown. It has a start and end time, represented by timestamps. The end time must be greater than the start time, and the start time must be greater than or equal to all previous start times.
Cues may have overlapping timings, unless the WebVTT file is being used for chapters (<track>
kind
is chapters
).
Each cue timing contains five components:
- A timestamp for the start time.
- At least one space.
- The string
-->
. - At least one space.
- A timestamp for the end time, which must be greater than the start time.
The timestamps can be specified in one of the following two formats:
mm:ss.ttt
hh:mm:ss.ttt
Where the components are defined as follows:
hh
-
Represents hours and must be at least two digits. It can be greater than two digits (e.g.,
9999:00:00.000
). mm
-
Represents minutes and must be between 00 and 59, inclusive.
ss
-
Represents seconds and must be between 00 and 59, inclusive.
ttt
-
Represents milliseconds and must be between 000 and 999, inclusive.
Here are a few cue timing examples:
-
Basic cue timing examples
00:00:22.230 --> 00:00:24.606 00:00:30.739 --> 00:00:34.074 00:00:34.159 --> 00:00:35.743 00:00:35.827 --> 00:00:40.122
-
Overlapping cue timing examples
00:00:00.000 --> 00:00:10.000 00:00:05.000 --> 00:01:00.000 00:00:30.000 --> 00:00:50.000
-
Non-overlapping cue timing examples
00:00:00.000 --> 00:00:10.000 00:00:10.000 --> 00:01:00.581 00:01:00.581 --> 00:02:00.100 00:02:01.000 --> 00:02:01.000
Cue settings
Cue settings are optional components that position the cue payload text over the video. This includes horizontal and vertical positions. Zero or more cue settings can be specified and used in any order so long as each setting is used no more than once.
Cue settings are added to the right of cue timings. There must be one or more spaces between the cue timing and the first setting and between each setting. A colon separates a setting's name and value. The settings are case-sensitive; use lowercase as shown. There are five available cue settings:
vertical
-
Indicates that the text will be displayed vertically rather than horizontally, such as in some Asian languages. There are two possible values:
line
-
If
vertical
is not set,line
specifies where the text appears vertically. Ifvertical
is set,line
specifies where text appears horizontally. Its value can be:- A line number
-
The position of the first line of the cue as it appears on the video. Positive numbers are counted from the top down and negative numbers are counted from the bottom up.
- A percentage
-
An integer (i.e., no decimals) between 0 and 100 inclusive, which must be followed by a percent sign (%).
Line vertical
omittedvertical:rl
vertical:lr
line:0
top right left line:-1
bottom left right line:0%
top right left line:100%
bottom left right position
-
If
vertical
is not set,position
specifies where the text will appear horizontally. Ifvertical
is set,position
specifies where the text will appear vertically. The value is a percentage between 0 and 100 inclusive.Position vertical
omittedvertical:rl
vertical:lr
position:0%
left top top position:100%
right bottom bottom size
-
If
vertical
is not set,size
specifies the width of the text area. Ifvertical
is set,size
specifies the height of the text area. The value is a percentage between 0 and 100 inclusive.Size vertical
omittedvertical:rl
vertical:lr
size:100%
full width full height full height size:50%
half width half height half height align
-
Specifies the alignment of the text. Text is aligned within the space given by the size cue setting if it is set.
Align vertical
omittedvertical:rl
vertical:lr
align:start
left top top align:center
centered horizontally centered vertically centered vertically align:end
right bottom bottom
Here are a few examples. The first line demonstrates no settings. The second line might be used to overlay text on a sign or label. The third line might be used for a title. The last line might be used for an Asian language.
00:00:05.000 --> 00:00:10.000 00:00:05.000 --> 00:00:10.000 line:63% position:72% align:start 00:00:05.000 --> 00:00:10.000 line:0 position:20% size:60% align:start 00:00:05.000 --> 00:00:10.000 vertical:rt line:-1 align:end 00:00:05.000 --> 00:00:10.000 position:10%,line-left align:left size:31% 00:00:05.000 --> 00:00:10.000 position:90% align:right size:35% 00:00:05.000 --> 00:00:10.000 position:45%,line-right align:center size:90%
Cue payload
The payload is where the cue content is defined, such as the subtitle or closed caption text. It may contain newlines but cannot contain two consecutive newlines: that would create a blank line, which indicates the end of the block.
A cue text payload cannot contain the string -->
, the ampersand character (&
), or the less-than sign (<
).
If needed, you can instead use a character reference such as the named character reference &
for ampersand and <
for less-than.
It is also recommended that you use the greater-than escape sequence >
instead of the greater-than character (>
) to avoid confusion with tags.
If you are using the WebVTT file for metadata these restrictions do not apply.
Note that all major browsers allow any character reference in cues, notes, or other text. Older browser versions may support only the following subset of named character references:
Name | Character | Escape sequence |
---|---|---|
Ampersand | & |
& |
Less-than | < |
< |
Greater-than | > |
> |
Left-to-right mark | none | ‎ |
Right-to-left mark | none | ‏ |
Non-breaking space | |
Cue payload text tags
A number of tags, such as <b>
, can be used for marking up and styling text within a cue.
However, if the WebVTT file is used in a <track>
element where the attribute kind
is chapters
then you cannot use tags.
- Timestamp tag
-
Timestamp tags are used to enable karaoke-style captions. The timestamp must be greater that the cue's start timestamp, greater than any previous timestamp in the cue payload, and less than the cue's end timestamp. The active text is the text between the timestamp and the next timestamp or to the end of the payload if there is not another timestamp in the payload. Any text before the active text in the payload is previous text. Any text beyond the active text is future text.
1 00:16.500 --> 00:18.500 When the moon <00:17.500>hits your eye 1 00:00:18.500 --> 00:00:20.500 Like a <00:19.000>big-a <00:19.500>pizza <00:20.000>pie 1 00:00:20.500 --> 00:00:21.500 That's <00:00:21.000>amore
The following tags are the HTML tags allowed in a cue and require opening and closing tags (e.g., <b>text</b>
).
Text marked up with these tags can be formatted in STYLE
blocks using the ::cue
pseudo-element.
- Italics tag (
<i></i>
) -
Italicize the contained text.
xml<i>text</i>
- Bold tag (
<b></b>
) -
Bold the contained text.
xml<b>text</b>
- Underline tag (
<u></u>
) -
Underline the contained text.
xml<u>text</u>
- Class tag (
<c></c>
) -
Add a class to the contained text for selection via CSS.
xml<c.classname>text</c>
- Ruby tag (
<ruby></ruby>
) -
Used with ruby text tags to display ruby characters (i.e., small annotative characters above other characters).
xml<ruby>WWW<rt>World Wide Web</rt>oui<rt>yes</rt></ruby>
- Ruby text tag (
<rt></rt>
) -
Used with ruby tags to display ruby characters (i.e., small annotative characters above other characters).
xml<ruby>WWW<rt>World Wide Web</rt>oui<rt>yes</rt></ruby>
- Voice tag (
<v></v>
) -
Similar to class tag, also used to style the contained text using CSS.
xml<v Bob>text</v>
- Lang tag (
<lang></lang>
) -
Used to highlight text that has been marked up as belonging to a particular language or language variant using the format defined in RFC 5646: Tags for Identifying Languages (also known as BCP 47).
xml<lang en-GB>English text as spoken in Great Britain!</lang>
NOTE blocks
NOTE blocks are optional sections that can be used to add comments to a WebVTT file. They are intended for those reading the file and are not seen by users. For example, you might use them to record author contact details, provide an overview of your structure, or add placeholders for cues that still need to be written.
They can be used anywhere in the WebVTT file after the header.
NOTE blocks may contain newlines but cannot contain two consecutive newlines: that would create a blank line, which indicates the end of the block.
A comment cannot contain the string -->
, the ampersand character (&
), or the less-than sign (<
).
If you wish to use these characters, you need to instead use a character reference such as &
for ampersand and <
for less-than.
It is also recommended that you use the greater-than escape sequence (>
) instead of the greater-than character (>
) to avoid confusion with tags.
A comment consists of three parts:
- The string
NOTE
. - A space or a new line.
- Zero or more characters other than those noted above.
Here are some examples:
NOTE This is a single line comment NOTE This is a simple multi line comment NOTE One comment that is spanning more than one line. NOTE You can also make a comment across more than one line this way. NOTE TODO I might add a line to indicate work that still has to be done.
STYLE Blocks
STYLE
blocks are optional sections that can be used to embed CSS styling of cues within a WebVTT file.
Note that these are used to style the appearance and size of the cues, but not their position and layout, which are controlled by the Cue settings.
Note: WebVTT cues can also be matched by CSS styles loaded by the associated document embedding the video/audio element.
STYLE
blocks must appear before any cue blocks in the file.
Each block consists of the following lines:
- The String
STYLE
followed by zero or more space or tab characters, and then a newline. - A string defining the CSS styles to match and apply, using the
::cue
pseudo-element.
The block cannot contain the string -->
.
It may contain newlines but cannot contain two consecutive newlines: that would create a blank line, which indicates the end of the block.
A simple WebVTT files with two STYLE
blocks is shown below.
This uses ::cue
to apply a text color to all cue text, and a different text color just to text that is tagged with <b></b>
tags.
WEBVTT STYLE ::cue { background-image: linear-gradient(to bottom, dimgray, lightgray); color: papayawhip; } /* Style blocks cannot use blank lines nor "dash dash greater than" */ NOTE comment blocks can be used between style blocks. STYLE ::cue(b) { color: peachpuff; } 00:00:00.000 --> 00:00:10.000 - Hello <b>world</b>. NOTE style blocks cannot appear after the first cue.
Note: There are live examples demonstrating many of the following cases in More cue styling examples in WebVTT API.
Match all cue payload text
Match on all cue payload text using ::cue
.
For example, the following STYLE
block would match all cue text and color it yellow.
STYLE ::cue { color: yellow; }
Match a tag type
Match cue text marked up with particular cue payload text tags, such as c
, i
, b
, u
, ruby
, rt
, v
, and lang
, by specifying the tag in ::cue()
as a type selector.
For example, the following block would match cue payload text marked up with lang
as yellow, and each of the other tags as red.
STYLE ::cue(c), ::cue(i), ::cue(b), ::cue(u), ::cue(ruby), ::cue(rt), ::cue(v) { color: red; } ::cue(lang) { color: yellow; }
Match a class selector
Match all tags marked up using a class selector in ::cue()
.
The STYLE
block in the following WebVTT file would match all the text after it, because all the tags have the myclass
class.
WEBVTT STYLE ::cue(.myclass) { color: yellow; } 00:00:00.000 --> 00:00:08.000 <c.myclass>Yellow!</c> <i.myclass>Yellow!</i> <u.myclass>Yellow!</u> <b.myclass>Yellow!</b> <u.myclass>Yellow!</u> <ruby.myclass>Yellow! <rt.myclass>Yellow!</rt></ruby> <v.myclass Kathryn>Yellow!</v> <lang.myclass en>Yellow!</lang>
To select a particular tag and class you must specify both in ::cue()
:
STYLE ::cue(b.myclass) {
color: yellow;
}
Match an attribute
Cue payload text marked up with a particular tag and attribute can be matched using an attribute selector.
For example, consider the following WebVTT file, which has text marked up using the v
and lang
cue payload text tags, using attributes to specify the particular voice ("Salame") and languages.
WEBVTT STYLE ::cue([lang="en-US"]) { color: yellow; } ::cue(lang[lang="en-GB"]) { color: cyan; } ::cue(v[voice="Salame"]) { color: lime; } 00:00:00.000 --> 00:00:08.000 Yellow! 00:00:08.000 --> 00:00:16.000 <lang en-GB>Cyan!</lang> 00:00:16.000 --> 00:00:24.000 I like <v Salame>lime.</v>
Match using pseudo-classes
The previous example styled text for a particular language using attribute matching.
You can also match languages using the :lang()
pseudo class, as demonstrated by the STYLE
block below.
STYLE ::cue(:lang(en)) { color: yellow; } ::cue(:lang(en-GB)) { color: cyan; }
You can similarly match with the :past
and :future
pseudo-classes, to provide a karaoke-like experience.
video::cue(:past) {
color: yellow;
}
video::cue(:future) {
color: cyan;
}
Other pseudo-classes such as link
, nth-last-child
, and nth-child
should work similarly.
Match a cue id
Match against a particular cue id
by specifying the id
inside ::cue()
.
Note: At time of writing this does not appear to be supported in any of the main browsers.
For example, the following WebVTT file should style the cue with identifier cue1
in green.
WEBVTT STYLE ::cue(#cue1) { color: green; } cue1 00:00:00.000 --> 00:00:08.000 Green!
Note that escape sequences are used in WebVTT CSS in the same way as HTML pages. The below example shows how to escape spaces in a cue identifier:
WEBVTT STYLE ::cue(#transcription\ credits) { color: red; } transcription credits 00:04.000 --> 00:05.000 Transcribed by Célestes™
Specifications
Specification |
---|
WebVTT: The Web Video Text Tracks Format |
Browser compatibility
BCD tables only load in the browser
See also
- The CSS
::cue
and::cue()
pseudo-elements