~/2018/08/12

Fun with subtitles

I know it's not really studying, but in the pursuit of learning Japanese I try to listen to Japanese media in my downtime too. Lately that involves watching a lot of J-drama with English subtitles and being frustrated that I can't have Japanese subtitles at the same time.

If you happen to have media with high quality subtitles in both your target and native languages, there are a few things you can do about it.

Extract subtitles from a container format

If they're bundled in a container format like MKV, first identify the subtitle tracks available for extraction: ffmpeg -i whatever.mkv. Take note of the stream ID and make use of use of ffmpeg's -map flag to pull out each subtitle track that you want.

$ for i in *.mkv; do ffmpeg -i "$i" -map 0:s:25 -c copy "$i".jp.srt; ffmpeg -i "$i" -map 0:s:2 -c copy "$i".en.srt; done

Those stream IDs are only valid for the file I was working on, finding them took a bit of trial and error. Don't give up!

Creating bilingual subtitles

With the subtitles available as separate tracks, the next step is to mash them together. I tried two tools for this:

substudy

substudy1 is part of a suite of CLI applications called subtitles-rs. It seemed promising based on the examples in the README:

for i in *.mkv; do substudy combine "$i.en.srt" "$i.jp.srt" > "$i.bilingual.srt"; done

Unfortunately, it didn't work for me. I encountered errors like this for many of the files:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorMessage { msg: "Cannot truncate time period Period { begin: 1735.566, end: 1742.3 } at 1735.567" }', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Refer to this issue for more details. Apparently it's possible to work around it but it requires manual intervention, and that's where I stopped looking into it.

dualsub

Due to the issues I encountered with substudy, I ended up using dualsub2 to combine the subtitles.

It's a GUI Java program, so it can't be scripted easily, but you can queue up a batch of files to process by dropping files from your file browser onto the window.

DualSub user interface.
Example of the output.

Using these files to create Anki decks

Depending on your tastes, you might prefer to use subs2srs3 for this because it can optionally extract short video clips. Unbelievably, it works well in WINE. At several points during my trial export the UI became unresponsive, but just give it time and it will do its thing.

The user interface.

Even though I found subs2srs more featureful (videos probably help greatly with remembering), I ended up using substudy for this because it's more easily scriptable:

$ for i in *.mkv; do substudy export csv "$i" "$i.en.srt" "$i.jp.srt"; done

Now if you're running this on multiple files you have another challenge: combine each cards.csv so that you only have to import one file. In order to do this, discard the first (CSV header) line of each file and concatenate the result:

$ find . -type f -iname '*cards.csv' -print0 | xargs --null sed 1d > all-cards.csv

You then need to prepend the header of one of the existing files, which can be found like this:

$ find -type f -iname '*cards.csv' -print0 -quit | xargs --null head -n 1
sound,time,source,image,foreign_curr,native_curr,foreign_prev,native_prev,foreign_next,native_next

So the full command to combine the cards is this:

$ { tee >(find -type f -iname '*cards.csv' -print0 -quit | xargs --null head -n 1) >(find . -type f -iname '*cards.csv' -print0 | xargs --null sed 1d) > /dev/null } >| all-cards.txt

It's also easier to move the media to one folder before copying it into Anki:

$ mkdir all-media
$ find . -type f \( -iname '*.mp3' -o -iname '*.jpg' \) -print0 | xargs --null -I '{}' mv '{}' ./all-media

Then you can just copy everything in that folder to your collection.media folder4.

Sample of the output.

There are some decks like this on AnkiWeb but they tend to get deleted due to containing copyrighted material. If this is something you're interested in, check out this Blogspot site while it lasts.


  1. Nix expression for subtitles-rs

  2. Nix expression for dualsub

  3. I didn't use Nix to package this because I'm not sure how to package WINE apps yet. Instead, I just downloaded the executable and ran it using wine.

  4. "~/.local/share/Anki2/User 1/collection.media" on my system.