The Personae corpus was collected for experiments in Authorship Attribution and Personality Prediction. It consists of 145 Dutch-language essays, written by 145 different students (BA in Linguistics and Literature at the University of Antwerp, Belgium). Each student also took an online MBTI personality test, allowing personality prediction experiments. The corpus was controlled for topic, register, genre, age, and education level.
We make available the original texts, a syntactically annotated version of the texts, and the metadata.
The construction of the corpus was made possible by a grant from the Flemish Research Foundation (FWO) for the 'Computational Techniques for Stylometry for Dutch' project.
If you use this dataset in your research, make sure to cite the following paper:
Luyckx, Kim & Daelemans, Walter (2008). Personae, a Corpus for Author and Personality Prediction from Text. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). Marrakech, Morocco.