Computational Linguistics

Can we model language understanding, production, learning, and translation with computational models? Computational linguistics research at CLiPS is concerned with the study of computational methods for the representation, acquisition, and use of language knowledge.

We focus on the application of statistical and machine learning methods, trained on corpus data, to explain human language acquisition and processing data, and to develop automatic text analysis systems that are accurate, efficient, and robust enough to be used in practical applications. We develop specific machine learning algorithms suited for the properties of language data (few regularities, many irregularities and exceptions), and develop new methodologies for simulation of these language data.

Our application-oriented research is in the domain of Language Technology, the development of language processing tools to solve concrete problems. Research focus here has been on text mining (extracting knowledge from unstructured text data). We develop new approaches combining machine learning and automatic text analysis to solve generic problems in text mining (automatic summarization, question answering, information extraction, smart search, ontology learning, etc.). We build these generic solutions into prototypes for specific applications. Recently, the group has also developed research initiatives on language technology for African languages, and on Digital Humanities (especially the areas of computational stylometry and language technology for the study of old variants of Dutch).

  Voice control of the apparatus that we use in our daily lives is perceived as a luxury. Often, a remote control is even better suited for home automation because we find it easier to push a button than to speak a command. However, for persons with a physical impairment, pushing a button is not always as easy as it is for most people and voice control is a viable solution for them. What is perceived as a luxury for most people can actually mean a significant...
AMiCA is an IWT, SBO (Strategic Basic Research) funded pre-project with a societal valorization goal. It is a one-year project preparing a full (four-year) SBO project proposal. The project aims to mine relevant website resources (blogs, chat rooms, and social networks), and collect, analyse, and integrate large amounts of subjective information using text and image analysis with the ultimate goal of tracing harmful content in an automatic way.
This project wants to investigate how techniques of statistical relational learning can be used for natural language processing. The focus will be on challenging natural language processing tasks, such as semantic role labeling, where syntac and semantic depedencies, structured and unstructured data, local and global models, and probabilistic and logical information must be combined with one another. For what concerns statistical relational learning, the emphasis will lie on...
In this project we on the one hand propose a methodology to (semi)-automate the manual control of peer-to-peer networks and on the other hand a methodology for the automatic extraction and analysis of linguistic features (associated to age, gender and deceptive language usage) of chat language. The aim of the DAPHNE project is to develop a software prototype that will support the law enforcement agencies' control of peer-to-peer networks with regard to the illegal distribution of child...
The goal of the deLearyous project is the development of a 3D serious video game where players can practice their communication skills by interacting with a virtual character. The trainee talks to the virtual character by typing in sentences in Dutch. The software then has to interpret these sentences and determine the appropriate reaction for the virtual character. The virtual character's response follows a communication model known as the Interpersonal Circumplex (AKA "Leary's Rose", after...
The goal of this project is to implement a robust, modular system for stylometry and readability research on the basis of existing techniques for automatic text analysis and machine learning, and the development of a web service that allows researchers in the humanities and social sciences to analyze texts with this system. In this way, the project will make available to researchers recent advances in research on the computational modeling of style and readability.
This project involves making available as web services some of the language processing tools that were developed within the CLiPS computational linguistics group.
FLaReNet aims at developing a common vision of the area language technology and fostering a European strategy for consolidating the sector, thus enhancing competitiveness at EU level and worldwide. By creating a consensus among major players in the field, the mission of FLaReNet is to identify priorities as well as short, medium, and long-term strategic objectives and provide consensual recommendations in the form of a plan of action for EC, national organisations and industry. Through the...
The aim of this project is to develop an exemplar-based model of human sentence parsing that is capable of identifying the relations between the different words of a sentence in a psychologically adequate manner. Exemplar-based models of language processing (Daelemans & Van den Bosch, 2005) explicitly store every language experience in memory. New linguistic tasks are solved in analogy with these stored experiences. This approach can form an alternative to formal-symbolic and connectionist...
The growing overload of textual information available to organizations and professionals hampers effective knowledge management and discovery by increasing the time needed to find relevant information and by causing crucial information to be missed. Especially in the health sciences this is seen as a vexing problem, as the huge and largely unexplored volume of published literature, in combination with structured databases representing experimental data and background knowledge, might lead to...