Dec 30, 2013

FQL: How to retrieve "furigana" for Japanese user name

Japanese Writing System

Japanese language is very unique and its origin is still debated. Its writing system is also unique in terms of having three scripts:
  • kanji -- ideographic characters borrowed from Chinese
  • hiragana -- phonogramic characters, originally a simplified form of kanji 
  • katakana -- phonogramic characters, originally derived from components of kanji 

Problem with Kanji

Kanji consists of more than 2,000 commonly used characters while hiragana and katakana each consist of only 46 characters. This extremely large amount of kanji becomes troublesome to most people.
One more tough thing about Japanese kanji is that each character has more than two pronunciations: Chinese original pronunciation -- on-yomi(音読み) -- and Japanese original pronunciation(s) -- kun-yomi(訓読み). So the problem is that when kanji characters are combined and used in person names, we can't really tell what pronunciation to use.

Hiragana and Katakana as Reading Aid

With experience, people can tell how to pronounce commonly used person names, but telling each pronunciation in programatic way requires a large dictionary and seems almost impossible. In most cases, since hiragana and katakana are phonograms, we use them to help tell the pronunciation. When hiragana or katakana is explicitly used in this purpose, it is called furigana(フリガナ). So most user registration system obligate users to input furigana along with their original kanji names.

Retrieving Furigana with FQL

In 2012, I implimented Facebook social login to my service and found it difficult to retrieve furigana of registering user name. It was really frustrating. Using social login should simplify both user experience and source code, but if we can't retrieve furigana, we have to obligate users to input manually, which I think ruins user experience at first place. I checked up whole Facebook Graph API and FQL documents and finally found a way to do it.
There are columns called sort_first_name and sort_last_name on user table and these columns returns furigana for first name and last name. The minimum query is as below:
SELECT first_name, sort_first_name, last_name, sort_last_name,name FROM user WHERE uid = 4400758
On request, depending on your Facebook app settings, you must add locale=ja_JP parameter.
curl -X GET 'https://graph.facebook.com/fql?q=SELECT+first_name%2C+sort_first_name%2C+last_name%2C+sort_last_name%2Cname+FROM+user+WHERE+uid+%3D+44007581&locale=ja_JP'
You must be careful if the user hasn't input Japanese name -- mostly those Japanese users who registered back in those days Facebook was served only in English may not have registered kanji and furigana -- returns latin characters in sort_*_name columns.

Below is the code I used.
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use Facebook::OpenGraph;
use Data::Dumper;
use Data::Recursive::Encode;

my $fb  = Facebook::OpenGraph->new;
my $ret = $fb->fql('SELECT first_name, sort_first_name, last_name, sort_last_name,name FROM user WHERE uid = 44007581');
$ret = Data::Recursive::Encode->encode_utf8($ret || +{});
warn Dumper $ret;
#$VAR1 = {
#          'data' => [
#                      {
#                        'sort_first_name' => 'ゴウ',
#                        'name' => '萩原 豪',
#                        'first_name' => '豪',
#                        'last_name' => '萩原',
#                        'sort_last_name' => 'ハギワラ'
#                      }
#                    ]
#        };