Project information
Abstract: 

In this project we investigate the applicability of machine learning techniques (supervised and unsupervised methods) to various language technology problems for African languages. 

Abstract Dutch: 

In dit project onderzoeken we de toepasbaarheid van automatische leermethoden voor taaltechnologische problemen voor Afrikaanse talen.

Project Leader(s): 
Guy De Pauw

Publications + Talks

Chege, K., Ng'Ang'a W., Wagacha P. W., De Pauw G., & Mutiga J. (2011).  Morphological Analysis of Gĩkũyũ using a Finite State Machine. Proceedings of Conference on Human Language Technology for Development. 112-117. PDF
Hoogeveen, D., & De Pauw G. (2011).  CorpusCollie - A Web Corpus Mining Tool for Resource-Scarce Languages. Proceedings of Conference on Human Language Technology for Development. 44-49. PDF
Kituku, B., Wagacha P. W., & De Pauw G. (2011).  A Memory-Based Approach to Kĩkamba Named Entity Recognition. Proceedings of Conference on Human Language Technology for Development. 106-111. PDF
De Pauw, G., Maajabu N., & Wagacha P. W. (2010).  A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging. (De Pauw, Guy, H. Groenewald, de Schryver, G.-M., Ed.).Proceedings of the Second Workshop on African Language Technology (AfLaT 2010). 15-20. PDF
K. Chege, Wagacha P. W., De Pauw G., Lawrence Muchemi, & Wanjiku Ng'ang'a (2010).  Developing an Open source spell checker for Gĩkũyũ. (De Pauw, Guy, H. Groenewald, de Schryver, G.-M., Ed.).Proceedings of the Second Workshop on African Language Technology (AfLaT 2010). 31-35.
De Pauw, G., & de Schryver G. - M. (2009).  African Language Technology: the Data-Driven Perspective. (V. Lyding, Ed.).Lesser Used Languages and Computer Linguistics (LULCLII) - Combining efforts to foster computational support of minority languages. 79-96.
De Pauw, G., Wagacha P. W., & de Schryver G. - M. (2009).  The SAWA corpus: a parallel corpus English - Swahili. Proceedings of the workshop on Language Technologies for African Languages (AfLaT 2009). 9-16.
Syndicate content