Previous abstract | Contents | Next abstract

Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces

This paper systematically compares unsupervised word sense discrimination techniques that cluster instances of a target word that occur in raw text using both vector and similarity spaces. The context of each instance is represented as a vector in a high dimensional feature space. Discrimination is achieved by clustering these context vectors directly in vector space and also by finding pairwise similarities among the vectors and then clustering in similarity space. We employ two different representations of the context in which a target word occurs. First order context vectors represent the context of each instance of a target word as a vector of features that occur in that context. Second order context vectors are an indirect representation of the context based on the average of vectors that represent the words that occur in the context. We evaluate the discriminated clusters by carrying out experiments using sensetagged instances of 24 SENSEVAL-2 words and the well known Line, Hard and Serve sensetagged corpora.

Amruta Purandare and Ted Pedersen, Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces. In: Proceedings of CoNLL-2004, Boston, MA, USA, 2004, pp. 41-48. [ps] [ps.gz] [pdf] [bibtex]

Last update: May 13, 2003. erikt@uia.ua.ac.be