Previous abstract | Contents | Next abstract

Memory-Based Named Entity Recognition using Unannotated Data

We used the memory-based learner Timbl (Daelemans et al., 2002) to find names in English and German newspaper text. A first system used only the training data, and a number of gazetteers. The results show that gazetteers are not beneficial in the English case, while they are for the German data. Type-token generalization was applied, but also reduced performance. The second system used gazetteers derived from the unannotated corpus, as well as the ratio of capitalized versus uncapitalized use of each word. These strategies gave an increase in performance.

Fien De Meulder and Walter Daelemans, Memory-Based Named Entity Recognition using Unannotated Data. In: Proceedings of CoNLL-2003, Edmonton, Canada, 2003, pp. 208-211. [ps] [ps.gz] [pdf] [bibtex]

Last update: June 11, 2003. erikt@uia.ua.ac.be