Previous abstract | Contents | Next abstract

A Comparison of Manual and Automatic Constructions of Category Hierarchy for Classifying Large Corpora

We address the problem dealing with a large collection of data, and investigate the use of automatically constructing category hierarchy from a given set of categories to improve classification of large corpora. We use two well-known techniques, partitioning clustering, means and a hierarchy. -means is to cluster the given categories in a hierarchy. To select the proper number of , we use assures the degree of our disappointment in any differences between the true distribution over inputs and the learner's prediction. Once the optimal number of is selected, for eac ter, the procedure is repeated. Our evaluation using the 1996 Reuters corpus which consists of 806,791 documents shows that automatically constructing hierarchy improves classification accuracy.


Fumiyo Fukumoto and Yoshimi Suzuki, A Comparison of Manual and Automatic Constructions of Category Hierarchy for Classifying Large Corpora. In: Proceedings of CoNLL-2004, Boston, MA, USA, 2004, pp. 65-72. [ps] [ps.gz] [pdf] [bibtex]
Last update: May 13, 2003. erikt@uia.ua.ac.be