Topic Modeling the Hˆn di_n (__)

Colin Allen, Hongliang Luo, Jaimie Murdock, Jianghuai Pu1, Xiaohong Wang, Xiaoliang Wang, Wenjing Yuan, Kun Zhao, and Yanjie Zhai, Indiana University

We describe a collaborative effort between Indiana University and XiĠan Jiaotong University to support exploration and interpretation of a corpus of over 18,000 ancient Chinese documents - the __ corpus which we also refer to as the ÒHandian" corpus. We describe the corpus and introduce our application of probabilistic topic modeling to this corpus, with attention to the particular challenges posed by modeling ancient Chinese documents. We give a specific example of how the software we have developed can be used to aid discovery and interpretation of themes in the corpus, using a public interface available at http://inphodata.cogs.indiana.edu/handian/.