~/2018/06/10

Editing Anki decks with SQLite

I've been studying Kanji on and off for years now, but with more enthusiasm since visiting Japan last December. Soon I'll be taking the JLPT N5 so in preparation I wanted to focus on the Kanji required for the test.

So far, I've been using Anki to study. There are a bunch of shared decks on AnkiWeb that are specifically for N5 preparation, but I would prefer if I could keep using my favourite deck.

For this I needed to find a list of the Kanji to focus on. Unfortunately as of 2010 there is no official list of Kanji required for the test, but there are many sites that have lists composed of Kanji that appeared in previous exams.

I used this one. In order to extract a list from that table, use the browser's developer tools:

> Array.from(document.querySelectorAll('tr > td + td > a')).map(element => element.textContent).join(' ')
安 一 飲 右 雨 駅 円 火 花 下 何 会 外 学 間 気 九 休 魚 金 空 月 見 言 古 五 後 午 語 校 口 行 高 国 今 左 三 山 四 子 耳 時 七 車 社 手 週 十 出 書 女 小 少 上 食 新 人 水 生 西 川 千 先 前 足 多 大 男 中 長 天 店 電 土 東 道 読 南 ニ 日 入 年 買 白 八 半 百 父 分 聞 母 北 木 本 毎 万 名 目 友 来 立 六 話"

Then, get excited because that Anki decks (.anki2 files) are just sqlite databases, and the file that you get from AnkiWeb (apkg) is just a Zip archive!

~/downloads
❯ file KanjiDamage.apkg
KanjiDamage.apkg: Zip archive data, at least v2.0 to extract
~/downloads
❯ mkdir kanjidamage-hacks && unzip -d kanjidamage-hacks KanjiDamage.apkg
~/downloads
❯ ls -lha kanjidamage-hacks
total 41M
drwxr-xr-x  2 eqyiel nogroup 2.0K May 21 19:07 .
drwxr-xr-x 11 eqyiel users     25 May 21 19:07 ..
-rw-r--r--  1 eqyiel nogroup 7.4K Jan  1 20:58 0
-rw-r--r--  1 eqyiel nogroup 2.0K Jan  1 20:58 1
-rw-r--r--  1 eqyiel nogroup  39K Jan  1 20:58 10
-rw-r--r--  1 eqyiel nogroup 7.4K Jan  1 20:58 100
-rw-r--r--  1 eqyiel nogroup 7.3K Jan  1 20:58 1000
-rw-r--r--  1 eqyiel nogroup 6.9K Jan  1 20:58 1001
-rw-r--r--  1 eqyiel nogroup 8.3K Jan  1 20:58 1002
-rw-r--r--  1 eqyiel nogroup 2.0K Jan  1 20:58 1003
-rw-r--r--  1 eqyiel nogroup  11K Jan  1 20:58 1004
-rw-r--r--  1 eqyiel nogroup 2.4K Jan  1 20:58 1005
-rw-r--r--  1 eqyiel nogroup 2.9K Jan  1 20:58 1006
-rw-r--r--  1 eqyiel nogroup 8.2K Jan  1 20:58 1007
-rw-r--r--  1 eqyiel nogroup 5.4K Jan  1 20:58 1008
-rw-r--r--  1 eqyiel nogroup 4.8K Jan  1 20:58 1009
... SNIP ...
-rw-r--r--  1 eqyiel nogroup 9.2K Jan  1 20:58 998
-rw-r--r--  1 eqyiel nogroup 8.3K Jan  1 20:58 999
-rw-r--r--  1 eqyiel nogroup 9.6M May 21 00:10 collection.anki2
-rw-------  1 eqyiel nogroup  46K May 21 00:10 media
~/downloads
❯ file kanjidamage-hacks/collection.anki2
kanjidamage-hacks/collection.anki2: SQLite 3.x database, last written using SQLite version 3022000

Using the sqlite CLI you can delete all but the relevant entries like so:

~/downloads
❯ sqlite3 kanjidamage-hacks/collection.anki2
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
sqlite> select count(*) from notes;
1758
sqlite> select count(*) from notes where sfld in ("安", "一", "飲", "右", "雨", "駅", "円", "火", "花", "下", "何", "会", "外", "学", "間", "気", "九", "休", "魚", "金", "空", "月", "見", "言", "古", "五", "後", "午", "語", "校", "口", "行", "高", "国", "今", " 左", "三", "山", "四", "子", "耳", "時", "七", "車", "社", "手", "週", "十", "出", "書", "女", "小", "少", "上", "食", "新", "人","水", "生", "西", "川", "千", "先", "前", "足", "多", "大", "男", "中", "長", "天", "店", "電", "土", "東", "道", "読", "南", "ニ", "日", "入", "年", "買", "白", "八", "半", "百", "父", "分", "聞", "母", "北", "木", "本", "毎", "万", "名", "目", "友", "来", "立", "六", "話");
102
sqlite> delete from notes where sfld not in ("安", "一", "飲", "右", "雨", "駅", "円", "火", "花", "下", "何", "会", "外", "学", " 間", "気", "九", "休", "魚", "金", "空", "月", "見", "言", "古", "五", "後", "午", "語", "校", "口", "行", "高", "国", "今", "左", "三", "山", "四", "子", "耳", "時", "七", "車", "社", "手", "週", "十", "出", "書", "女", "小", "少", "上", "食", "新", "人","水", "生", "西", "川", "千", "先", "前", "足", "多", "大", "男", "中", "長", "天", "店", "電", "土", "東", "道", "読", "南", "ニ", "日", "入", "年", "買", "白", "八", "半", "百", "父", "分", "聞", "母", "北", "木", "本", "毎", "万", "名", "目", "友", "来", "立", "六", "話");
sqlite> select count(*) from notes;
102

And recreate the package file:

~/downloads
❯ zip kanjidamage-n5.apkg --junk-paths -r --quiet kanjidamage-hacks

~/downloads
❯ file kanjidamage-n5.apkg
kanjidamage-n5.apkg: Zip archive data, at least v2.0 to extract

Caveats

I couldn't import the resulting deck into my Anki profile without first deleting my existing copy of that deck. It worked with a fresh profile though, which makes me suspect it's because I didn't change the IDs or anything and so it discarded the imported cards as duplicates.

On the bright side, the frustration I felt from this pushed to try Anki's filtered deck feature.

In order to study I created a filtered deck using the following query:

deck:KanjiDamage (Kanji:安 OR Kanji:一 OR Kanji:飲 OR Kanji:右 OR Kanji:雨 OR Kanji:駅 OR Kanji:円 OR Kanji:火 OR Kanji:花 OR Kanji:下 OR Kanji:何 OR Kanji:会 OR Kanji:外 OR Kanji:学 OR Kanji:間 OR Kanji:気 OR Kanji:九 OR Kanji:休 OR Kanji:魚 OR Kanji:金 OR Kanji:空 OR Kanji:月 OR Kanji:見 OR Kanji:言 OR Kanji:古 OR Kanji:五 OR Kanji:後 OR Kanji:午 OR Kanji:語 OR Kanji:校 OR Kanji:口 OR Kanji:行 OR Kanji:高 OR Kanji:国 OR Kanji:今 OR Kanji:左 OR Kanji:三 OR Kanji:山 OR Kanji:四 OR Kanji:子 OR Kanji:耳 OR Kanji:時 OR Kanji:七 OR Kanji:車 OR Kanji:社 OR Kanji:手 OR Kanji:週 OR Kanji:十 OR Kanji:出 OR Kanji:書 OR Kanji:女 OR Kanji:小 OR Kanji:少 OR Kanji:上 OR Kanji:食 OR Kanji:新 OR Kanji:人 OR Kanji:水 OR Kanji:生 OR Kanji:西 OR Kanji:川 OR Kanji:千 OR Kanji:先 OR Kanji:前 OR Kanji:足 OR Kanji:多 OR Kanji:大 OR Kanji:男 OR Kanji:中 OR Kanji:長 OR Kanji:天 OR Kanji:店 OR Kanji:電 OR Kanji:土 OR Kanji:東 OR Kanji:道 OR Kanji:読 OR Kanji:南 OR Kanji:ニ OR Kanji:日 OR Kanji:入 OR Kanji:年 OR Kanji:買 OR Kanji:白 OR Kanji:八 OR Kanji:半 OR Kanji:百 OR Kanji:父 OR Kanji:分 OR Kanji:聞 OR Kanji:母 OR Kanji:北 OR Kanji:木 OR Kanji:本 OR Kanji:毎 OR Kanji:万 OR Kanji:名 OR Kanji:目 OR Kanji:友 OR Kanji:来 OR Kanji:立 OR Kanji:六 OR Kanji:話)

It worked pretty well, but the search was slow (the UI froze up for a long while) so I wouldn't want to do this for more than a hundred or so Kanji.

One other gotcha with filtered decks is that they're transient, the cards don't stick around after being reviewed. So if you want to review all of the same cards for in each cramming session, you'll have to recreate the filtered deck each time.

I will definitely be revisiting this for round two before taking the N4.