Finding Representations for Memory-Based Language Learning

Constructive induction transforms the representation of instances in order to produce a more accurate model of the concept to be learned. For this purpose, a variety of operators has been proposed in the literature, including a Cartesian product operator forming pair-wise higher-order attributes. We study the effect of the Cartesian product operator on memory-based language learning, and demonstrate its effect on generalization accuracy and data compression for a number of linguistic classification tasks, using k-nearest neighbor learning algorithms. These results are compared to a baseline approach of backward sequential elimination of attributes. It is demonstrated that neither approach consistently outperforms the other, and that attribute elimination can be used to derive compact representations for memory-based language learning without noticeable loss of generalization accuracy.

Postscript provided by author: http://lcg-www.uia.ac.be/conll99/papers/raaijmakers.ps.gz

This is the abstract of a paper presented at the CoNLL-99 workshop.

Last update: May 23, 2000. erikt@uia.ua.ac.be