Fourth CLIF Symposium

Symposium on Language and Speech Technology in Flanders

  Location
  Building S — Grauwzusters
  Promotiezaal & patio
  Lange St. Annastraat 7
  Antwerpen

December 9, 2009 route description

 Click on image to enlarge.

You can see the photos here!
 
PROGRAM
 
9.00Doors
9.20Opening speech
9.30Streamlining processing stages in building the parallel corpus DPCHans Paulussen
10.00Automatic transcription of Flemish broadcast news showsKris Demuynck
10.30Coffee break
11.00Automatic speaker recognitionDavid van Leeuwen
11.50Proper name recognition using multilingual acoustic and lexical modelsBert Réveil
12.20A Multimodal Approach to Audiovisual Text-to-Speech SynthesisWesley Mattheyses
12.50Lunch break
14.30CLARIN -- Language and Speech Infrastructure for Researchers in the Humanities and Social Sciences.Ineke Schuurman
15.00Cross-lingual Word Sense DisambiguationEls Lefever
15.30Coffee break
16.00Alignment of grammatically divergent parses using interlingual MT techniquesTom Vanallemeersch
16.30Computational approaches to creativityTom De Smedt
17.00Reception: Belgian beers
 
 
INVITED SPEAKERS
 

Automatic speaker recognition

David van Leeuwen — Radboud University Nijmegen and TNO

Automatic speaker recognition is an area of speech technology that has received much attention from speech researchers in recent years. Some believe that it is the cleanest of all speech related recognition problems. Although simple in its formulation, the speaker recognition problem appears to have an intricate relation with its application. Text independent Speaker Recognition can be seen as a pattern recognition problem, where features are highly variable sequences related to a single source. The task is to detect whether the source is of known identity.

In this presentation, the typical characteristics of the speaker recognition approach are reviewed, and an overview of the machine learning techniques employed is given. Apart from spectral features, which are most dominant in speech, techniques involving linguistic models exist that can contribute to the discriminability of speakers. These approaches can be effectively combined to baseline systems using simple fusing and calibration techniques. The framework for measuring the performance of the state of the art in text independent speaker recognition are regular NIST speaker recognition evaluations.

Download presentation (pdf file)

CLARIN -- Language and Speech Infrastructure for Researchers in the Humanities and Social Sciences

Ineke Schuurman — CCL, K.U.Leuven

Download presentation (pdf file)

ABSTRACTS
 

Automatic transcription of Flemish broadcast news shows

Kris Demuynck — ESAT, K.U.Leuven

The continuous and steady improvements made over the years on both the accuracy and robustness of large vocabulary continuous speech recognition have lead to systems that can deal with with complex tasks such as the automatic transcription of broadcast news shows. In this presentation, we will describe one such system build on top of the open software toolkit SPRAAK. Several task and system related aspects will be briefly discussed:

  • lexical and morphological issues in Dutch
  • pronunciation modeling
  • text normalization and language modeling
  • speaker segmentation and clustering

The performance and main causes of error of the current system are analyzed on the NBest benchmark and on some recent Flemish news shows.

Download presentation (pdf file)

Streamlining processing stages in building the parallel corpus DPC

Hans Paulussen — French Linguistics, K.U.Leuven (campus Kortrijk)

The follow up and coordination of a multilingual corpus project requires a different approach compared to the compilation of monolingual corpora. Unlike the latter type of project, which are mainly focused on successive linear processing stages, multilingual corpora require a parallel follow up of data processing.

In building the Dutch Parallel Corpus (DPC), we not only had to cope with monitoring the sequential tasks of data acquisition, data processing and data packaging, but we also had to consider the complexities regarding the different types of data processing: sentence alignment and linguistic annotation, and this for three languages (Dutch, English and French). In order to manage the different aspects in corpus creation, the whole procedure was monitored through an electronic "matrix" (linked with metadata files), which could be updated flexibly on a daily basis, thus facilitating optimisation of the corpus design requirements.

In this talk we present the approach used in DPC to handle the follow up of the project and the coordination of the different processing stages.

Download presentation (pdf file)

Proper name recognition using multilingual acoustic and lexical models

Bert Réveil — DSSP, ELIS, UGent

Utterances of proper names remain a challenge for voice-driven car navigation or directory assistance applications as they exhibit a lot of pronunciation variation. The latter is owed to the fact that proper names often show archaic spelling or originate (in part) from foreign languages. Furthermore, the above applications usually require the accommodation of non-native users.

In order to address and explore this challenge in the context of Dutch Points Of Interest (POI) recognition, two previously proposed approaches were revisited. First, multiple foreign grapheme-to-phoneme (g2p) transcriptions were added into the lexicon of a monolingual proper name recognition system. In a second step a multilingual acoustic model was introduced. We found that both measures greatly improve the performance of the recognizer, and analyzed the improvements thoroughly.

However, even though the accuracy gains obtained with our best system were substantial, a cheating experiment with auditorily verified (AV) transcriptions revealed that further significant improvements are possible. Therefore, we are currently deploying so-called phoneme-to-phoneme (p2p) converters that try to transform a set of baseline transcriptions into a pool of transcription variants that lie closer to the “true” AV transcriptions. The first experiments have shown that p2p transcriptions allow us to further improve the recognition accuracy.

Download presentation (zip file with ppt)

A Multimodal Approach to Audiovisual Text-to-Speech Synthesis

Wesley Mattheyses — ETRO, VUB

Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio; the latter being either natural or synthesized speech. The possible perception of mismatches between these two information streams, which could degrade the quality, requires experimental exploration. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video.

In this presentation we discuss our synthesis strategy and we summarize the results of listening experiments we conducted.

Download presentation (zip file with ppt and xvid movies)

Cross-lingual Word Sense Disambiguation

Els Lefever and Véronique Hoste — LT3, Hogeschool Gent

We present a multilingual unsupervised Word Sense Disambiguation (WSD) task for a sample of English nouns. The task was formulated within the framework of the SemEval-2010 evaluation exercise. Instead of providing manually sense-tagged examples for each sense of a polysemous noun, our sense inventory is built up on the basis of the Europarl parallel corpus. The multilingual setup involves the translations of a given English polysemous noun in five supported languages, viz. Dutch, French, German, Spanish and Italian.

Organizing this task consists in: (a) the manual creation of a multilingual sense inventory for a lexical sample of English nouns and (b) the evaluation of systems on their ability to disambiguate new occurrences of the selected polysemous nouns.

For the creation of the hand-tagged gold standard, all translations of a given polysemous English noun are retrieved in the five languages and clustered by meaning. Human annotators label each instance with the appropriate cluster and their top-3 translations from this cluster. The frequencies of these translations are used to assign weights to all translations in the gold standard. Systems can participate in some of the five bilingual evaluation subtasks and in a multilingual subtask covering all language pairs.

To score the system output, we perform a "best" evaluation (where the credit for each correct guess is divided by the number of guesses) and a more "relaxed" evaluation of maximum 10 system guesses (where systems are not penalized for a higher number of guesses). We provide two baselines: the first baseline takes into account the most frequent GIZA++ word alignments whereas the second baseline uses the most frequent EuroWordNet sense.

Download presentation (pdf file)

Alignment of grammatically divergent parses using interlingual MT techniques

Tom Vanallemeersch — Translation Studies, Lessius

Alignment of nodes across parse trees is useful for several purposes, among which the creation or tuning of MT systems and computer-assisted translation. Tree alignment approaches combine features such as lexical equivalences, syntactic labels, tree levels and inside/outside scores. A non-trivial problem is the alignment of divergences, such as equivalent words with different syntactic categories and paraphrases.

We propose an approach for aligning grammatical divergences which is based on interlingual MT techniques from the Eurotra system, abstracts away from surface linguistic properties and creates semantic hypotheses. These divergences involve equivalences between verbs and deverbal nouns (e.g. "during their meeting" and "terwijl ze vergaderden" involve a semantic subject "they"/"ze" and an action "meet"/"vergaderen"), differences in tense and aspect, and differences in diathesis (e.g. passivisation).

For three languages (Dutch, French and English), we create a reference corpus with semantically annotated sentences, parse the sentences and associate subtree patterns with semantic hypotheses. We test the patterns and hypotheses by applying them to parses of Europarl sentence pairs and aligning hypotheses based on their similarity and a bilingual lexicon. We extend the bilingual sentence alignment in the Europarl corpus to a trilingual one in order to align hypotheses between three languages.

Download presentation (pdf file)

Computational approaches to creativity

Tom De Smedt — CLIPS, Universiteit Antwerpen

Traditionally, software applications for computer graphics have been based on real-world analogies. Each icon in the application's user interface represents a concrete object — a pen, an eraser, scissors. This model raises creative limitations: features can only be used as implemented by the developers, the screen is too small to display all the features (some are never discovered), actions are mouse- based so the user's decision-making process is literally lost in translation.

"NodeBox" is an ongoing effort to produce software that allows more people to express themselves creatively. One of our areas of interest is the way creative ideas are established and how these ideas can be mined from text. Using a number of NLP techniques (shallow parser, semantic network) and drawing inspiration from cognitive processes such as analogy and concept fluidity, the system is able to translate graphically underspecified concepts to something that can be used in a visual representation. For example, "creepy" has no direct visual representation - instead the system could propose you use an image of an octopus for your creepy design. For a given property (e.g. "creepy") and a range of concepts (e.g. animals) it yields the concepts from the range that best resemble the property (the creepiest animals). In this particular example the system will suggest such animals as octopus, bat, crow, locust, mayfly, termite, tick, amphibian, arachnid... No fluffy bunnies or frolicking ponies there!

Organisation: CLiPS University of Antwerp
Organizing team: Walter Daelemans, Patrick Wambacq, Vincent Van Asch
Sponsored by CLIF
 
CLIF logo  UA logo

Last updated: 15th of December 2009