Editing Anki decks with SQLite

I've been studying Kanji on and off for years now, but with more enthusiasm since visiting Japan last December. Soon I'll be taking the JLPT N5 so in preparation I wanted to focus on the Kanji required for the test.

So far, I've been using Anki to study. There are a bunch of shared decks on AnkiWeb that are specifically for N5 preparation, but I would prefer if I could keep using my favourite deck.

For this I needed to find a list of the Kanji to focus on. Unfortunately as of 2010 there is no official list of Kanji required for the test, but there are many sites that have lists composed of Kanji that appeared in previous exams.

I used this one. In order to extract a list from that table, use the browser's developer tools:

  > Array.from(document.querySelectorAll('tr > td + td > a')).map(element => element.textContent).join(' ')
  安 一 飲 右 雨 駅 円 火 花 下 何 会 外 学 間 気 九 休 魚 金 空 月 見 言 古 五 後 午 語 校 口 行 高 国 今 左 三 山 四 子 耳 時 七 車 社 手 週 十 出 書 女 小 少 上 食 新 人 水 生 西 川 千 先 前 足 多 大 男 中 長 天 店 電 土 東 道 読 南 ニ 日 入 年 買 白 八 半 百 父 分 聞 母 北 木 本 毎 万 名 目 友 来 立 六 話"

Then, get excited because that Anki decks (.anki2 files) are just sqlite databases, and the file that you get from AnkiWeb (apkg) is just a Zip archive!

  ❯ file KanjiDamage.apkg
  KanjiDamage.apkg: Zip archive data, at least v2.0 to extract
  ❯ mkdir kanjidamage-hacks && unzip -d kanjidamage-hacks KanjiDamage.apkg
  ❯ ls -lha kanjidamage-hacks
  total 41M
  drwxr-xr-x  2 eqyiel nogroup 2.0K May 21 19:07 .
  drwxr-xr-x 11 eqyiel users     25 May 21 19:07 ..
  -rw-r--r--  1 eqyiel nogroup 7.4K Jan  1 20:58 0
  -rw-r--r--  1 eqyiel nogroup 2.0K Jan  1 20:58 1
  -rw-r--r--  1 eqyiel nogroup  39K Jan  1 20:58 10
  -rw-r--r--  1 eqyiel nogroup 7.4K Jan  1 20:58 100
  -rw-r--r--  1 eqyiel nogroup 7.3K Jan  1 20:58 1000
  -rw-r--r--  1 eqyiel nogroup 6.9K Jan  1 20:58 1001
  -rw-r--r--  1 eqyiel nogroup 8.3K Jan  1 20:58 1002
  -rw-r--r--  1 eqyiel nogroup 2.0K Jan  1 20:58 1003
  -rw-r--r--  1 eqyiel nogroup  11K Jan  1 20:58 1004
  -rw-r--r--  1 eqyiel nogroup 2.4K Jan  1 20:58 1005
  -rw-r--r--  1 eqyiel nogroup 2.9K Jan  1 20:58 1006
  -rw-r--r--  1 eqyiel nogroup 8.2K Jan  1 20:58 1007
  -rw-r--r--  1 eqyiel nogroup 5.4K Jan  1 20:58 1008
  -rw-r--r--  1 eqyiel nogroup 4.8K Jan  1 20:58 1009
  ... SNIP ...
  -rw-r--r--  1 eqyiel nogroup 9.2K Jan  1 20:58 998
  -rw-r--r--  1 eqyiel nogroup 8.3K Jan  1 20:58 999
  -rw-r--r--  1 eqyiel nogroup 9.6M May 21 00:10 collection.anki2
  -rw-------  1 eqyiel nogroup  46K May 21 00:10 media
  ❯ file kanjidamage-hacks/collection.anki2
  kanjidamage-hacks/collection.anki2: SQLite 3.x database, last written using SQLite version 3022000

Using the sqlite CLI you can delete all but the relevant entries like so:

  ❯ sqlite3 kanjidamage-hacks/collection.anki2
  SQLite version 3.22.0 2018-01-22 18:45:57
  Enter ".help" for usage hints.
  sqlite> select count(*) from notes;
  sqlite> select count(*) from notes where sfld in ("安", "一", "飲", "右", "雨", "駅", "円", "火", "花", "下", "何", "会", "外", "学", "間", "気", "九", "休", "魚", "金", "空", "月", "見", "言", "古", "五", "後", "午", "語", "校", "口", "行", "高", "国", "今", " 左", "三", "山", "四", "子", "耳", "時", "七", "車", "社", "手", "週", "十", "出", "書", "女", "小", "少", "上", "食", "新", "人","水", "生", "西", "川", "千", "先", "前", "足", "多", "大", "男", "中", "長", "天", "店", "電", "土", "東", "道", "読", "南", "ニ", "日", "入", "年", "買", "白", "八", "半", "百", "父", "分", "聞", "母", "北", "木", "本", "毎", "万", "名", "目", "友", "来", "立", "六", "話");
  sqlite> delete from notes where sfld not in ("安", "一", "飲", "右", "雨", "駅", "円", "火", "花", "下", "何", "会", "外", "学", " 間", "気", "九", "休", "魚", "金", "空", "月", "見", "言", "古", "五", "後", "午", "語", "校", "口", "行", "高", "国", "今", "左", "三", "山", "四", "子", "耳", "時", "七", "車", "社", "手", "週", "十", "出", "書", "女", "小", "少", "上", "食", "新", "人","水", "生", "西", "川", "千", "先", "前", "足", "多", "大", "男", "中", "長", "天", "店", "電", "土", "東", "道", "読", "南", "ニ", "日", "入", "年", "買", "白", "八", "半", "百", "父", "分", "聞", "母", "北", "木", "本", "毎", "万", "名", "目", "友", "来", "立", "六", "話");
  sqlite> select count(*) from notes;

And recreate the package file:

  ❯ zip kanjidamage-n5.apkg --junk-paths -r --quiet kanjidamage-hacks

  ❯ file kanjidamage-n5.apkg
  kanjidamage-n5.apkg: Zip archive data, at least v2.0 to extract


I couldn't import the resulting deck into my Anki profile without first deleting my existing copy of that deck. It worked with a fresh profile though, which makes me suspect it's because I didn't change the IDs or anything and so it discarded the imported cards as duplicates.

On the bright side, the frustration I felt from this pushed to try Anki's filtered deck feature.

In order to study I created a filtered deck using the following query:

  deck:KanjiDamage (Kanji:安 OR Kanji:一 OR Kanji:飲 OR Kanji:右 OR Kanji:雨 OR Kanji:駅 OR Kanji:円 OR Kanji:火 OR Kanji:花 OR Kanji:下 OR Kanji:何 OR Kanji:会 OR Kanji:外 OR Kanji:学 OR Kanji:間 OR Kanji:気 OR Kanji:九 OR Kanji:休 OR Kanji:魚 OR Kanji:金 OR Kanji:空 OR Kanji:月 OR Kanji:見 OR Kanji:言 OR Kanji:古 OR Kanji:五 OR Kanji:後 OR Kanji:午 OR Kanji:語 OR Kanji:校 OR Kanji:口 OR Kanji:行 OR Kanji:高 OR Kanji:国 OR Kanji:今 OR Kanji:左 OR Kanji:三 OR Kanji:山 OR Kanji:四 OR Kanji:子 OR Kanji:耳 OR Kanji:時 OR Kanji:七 OR Kanji:車 OR Kanji:社 OR Kanji:手 OR Kanji:週 OR Kanji:十 OR Kanji:出 OR Kanji:書 OR Kanji:女 OR Kanji:小 OR Kanji:少 OR Kanji:上 OR Kanji:食 OR Kanji:新 OR Kanji:人 OR Kanji:水 OR Kanji:生 OR Kanji:西 OR Kanji:川 OR Kanji:千 OR Kanji:先 OR Kanji:前 OR Kanji:足 OR Kanji:多 OR Kanji:大 OR Kanji:男 OR Kanji:中 OR Kanji:長 OR Kanji:天 OR Kanji:店 OR Kanji:電 OR Kanji:土 OR Kanji:東 OR Kanji:道 OR Kanji:読 OR Kanji:南 OR Kanji:ニ OR Kanji:日 OR Kanji:入 OR Kanji:年 OR Kanji:買 OR Kanji:白 OR Kanji:八 OR Kanji:半 OR Kanji:百 OR Kanji:父 OR Kanji:分 OR Kanji:聞 OR Kanji:母 OR Kanji:北 OR Kanji:木 OR Kanji:本 OR Kanji:毎 OR Kanji:万 OR Kanji:名 OR Kanji:目 OR Kanji:友 OR Kanji:来 OR Kanji:立 OR Kanji:六 OR Kanji:話)

It worked pretty well, but the search was slow (the UI froze up for a long while) so I wouldn't want to do this for more than a hundred or so Kanji.

One other gotcha with filtered decks is that they're transient, the cards don't stick around after being reviewed. So if you want to review all of the same cards for in each cramming session, you'll have to recreate the filtered deck each time.

I will definitely be revisiting this for round two before taking the N4.