Projects

Past Projects

The aim of this project consists in systematically archiving and making available 200 hours of spoken Standard Dutch, produced by 160 Flemish and Dutch teachers of Dutch. The speech collection concerned is highly valuable. With respect to the composition of the corpus several social and linguistic variables were taken into account. Furthermore, the recordings are of high (stereo) quality. Therefore this corpus can be used for phonetic, phonological as well as for sociolinguistic purposes.
01/01/2007 - 31/12/2008
The well-known fact that similar information can be expressed in many different ways is one of the major challenges in building robust NLP applications. It is commonly assumed that such applications can be improved with knowledge of how natural language expressions relate to each other, for instance in terms of paraphrases (same semantic content, different wording) or entailments (one expression implied by the other). DAESO investigates the detection of semantic overlap between Dutch sentences...
01/06/2006 - 31/05/2009
The aim of this project is to investigate acoustic-phonetic characteristics of the speech of young congenitally deaf children who received a cochlear implant in their first year of life. In particular the acoustic characteristics of their babbling will be investigated in order to detect discrepancies with the babbling of hearing infants. In addition we will analyze spontaneous speech of these children at the age of six, and investigate whether it displays the typical characteristics of "deaf...
01/11/2005 - 30/10/2009
This project studies schwa epenthesis in spoken Standard Dutch.
01/10/2005 - 30/09/2008
Coreference resolution is a key ingredient for the automatic interpretation of text. It has been studied mainly from a linguistic perspective, with an emphasis on establishing potential antecedents for pronouns. Practical applications, such as Information Extraction (IE), summarization and Question Answering (QA), require accurate identification of coreference relations between noun phrases in general. Computational systems for assigning such relations automatically, require the availability of...
01/05/2005 - 31/10/2007
Cochlear implants have been used since the eighties and by now the technique is well established and implemented all over the world. When the age of implantation is charted out, it appears that children are being implanted at a steadily decreasing age: a few years ago, the youngest subjects that received a CI were around two years of age. Nowadays, children are already implanted around their first birthday, and the youngest subjects that have recently been implanted at the University of Antwerp...
01/01/2005 - 31/12/2008
Young children often insert 'fillers' in their first multiwordutterances: vocalizations that do not correspond to conventional words. For instance, it is hard to determine the meaning of the syllables [m] and [] in utterance (a). Fillers often have the shape of a syllabic nasal or a schwa, as in utterances (a) and (b). But sometimes they consist of several syllables, as in utterance (c). (a) [m] pick [] flowers (English learning boy, age 1;6; from Peters and Menn, 1993)(b) [] oiseau [] vole (...
01/10/2004 - 30/09/2007
The goal of the BioMinT project is to develop a generic text mining tool that (1) interprets diverse types of query, (2) retrieves relevant documents from the biological literature, (3) extracts the required information, and (4) outputs the result as a database slot filler or as a structured report. The consortium consists of biologists (University of Manchester, Swiss Institute of Bioinformatics) and data/text mining groups (CLiPS Antwerp, PharmaDM, Austrian research Institute for AI,...
01/01/2003 - 31/03/2006
Information Extraction (IE) is concerned with extracting relevant data from a collection of documents. During the past decade, several IE systems were developed for corpora of (semi-) structured or even unstructured texts. Those systems were trained using annotated corpora. Annotated data, however, are expensive and difficult to obtain in real-life applications. Therefore in this project we focus on the development of IE systems using semi-supervised learning. For this, we use a small (easy to...
01/01/2003 - 30/09/2004
In this project we investigate whether the "all-in-one" strategy currently used in speech recognizers, in which task-specific, syntactic, and lexical knowledge are fused into a single model based on simple formalisms, can be replaced by a modular architecture in which apart from acoustic-phonetic and intonational features, also generic and domain-specific linguistic information sources can be used.
01/10/2002 - 30/09/2006
MUSA aims at the creation of a multimodal multilingual system that converts audio streams into text transcriptions, translates the transcriptions in other languages and then generates subtitles from these translated transcriptions. MUSA will operate in English, French and Greek. A state-of-the-art Speech Recognition system will be enhanced and improved to meet the project settings. An innovative Machine Translation scenario will be designed that combines a Machine Translation engine with a...
01/09/2002 - 28/02/2005
This project concerns the use of a parallel cluster of PCs for simultaneous genetic optimization of (i) machine learning algorithm parameters, (ii) information source selection, and (iii) classifier combination, in the context of a machine learning approach to applications of language and speech technology.
01/01/2002 - 31/12/2002
The main goal of CLiPS for this project is the application and adaptation of shallow parsing technology for (i) extraction of lexons (ontological relations) from unstructured and semi-structured sources, (ii) evaluation of ontologies, and (iii) adaptation of ontologies (e.g. WordNet) to specific domains. A secondary goal is to investigate the use of ontologies to improve text analysis using shallow parsing.
01/01/2002 - 31/12/2005
Goal of the project is to confront and integrate deductive and inductive approaches to computational linguistics in the area of lexical semantics. Subprojects include the combination of supervised and unsupervised machine learning methods for semantic knowledge acquisition and disambiguation, the incorporation of linguistic semantic knowledge in inductive approaches, and the refinement of existing semantic tag sets with machine learning techniques.
01/01/2002 - 31/12/2005
The objective of this project is to study tonal structures of a number of Limburgian dialects, as spoken in The Netherlands and Flanders. The goals of the project are: (1) to collect prosodic data for a number of Limburgian dialects and to describe their phonological structures, (2) to exploit existing and newly collected data to study to what extent the specific tonal contrasts are perceptually relevant for listeners and (3) to investigate whether such tonal differences can influence the...
01/01/2002 - 31/12/2005
The aim of this project is the study of reduction phenomena in spontaneous (= non-read) Standard Dutch. We use speech from the Spoken Dutch Corpus (Corpus Gesproken Nederlands) and speech collected for the VNC-project Variation in the pronunciation of Standard Dutch. A more specific aim is to compare the pronunciation of highly educated speakers without linguistic training with the pronunciation of teachers of Dutch, who are often considered to be prototypical speakers of Standard Dutch. This...
01/10/2001 - 30/09/2005
The aim of the project is to perform empirical investigations to determine whether adequate prosody can be generated on the basis of two methods that have recently shown success in other language processing domains: (a) robust analysis of text by analyses and metrics from information retrieval and information extraction, and (b) advanced machine learning systems and meta learners.
01/01/2001 - 31/12/2004
The project aims at contributing to the development of better products for the automatic verbatim transcription of speech, and for the conversion of these transcriptions to a form that is better adapted to the needs of the end-user. One application which will be studied as a case study is the generation of subtitles for the benefit of hearing-impaired people. CLiPS will investigate learning techniques for the transcription of out-of-vocabulary items, and statistical techniques for aligning and...
01/10/2000 - 30/09/2004
Optimality Theory (OT) is the central paradigm in current theorizing about phonological acquisition. OT is a deductive model: (a priori) linguistic knowledge is represented in the child's linguistic (grammatical) competence. In this project we explore an empirist, inductive alternative for this approach. An empirist, inductive model is defined as a model in which the mental lexicon is central in acquisition. Linguistic knowledge is collected and stored in the lexicon. The contrast between...
01/10/2000 - 30/09/2004
In this project we study the auditory development, the speech and language acquisition in congenital deaf children with a cochlear implant (CI) implanted during their second year of life. Our aim is to systematically investigate the effect of the CI on different aspects of language and speech development: The effect of a CI on the auditory level; The effect of a CI on the articulatory level (the speech); The effect of a CI on language acquisition and communicative development. In essence, we...
01/10/2000 - 31/12/2006