CLiPS artwork, inspired by natural language processing

In case you were wondering, the website's theme is based on artwork permanently exhibited at our office:

 

Background information

The artwork is a digital illustration printed on aluminum and in part generated with a NodeBox algorithm (http://nodebox.net). It has three panels (3.5 x 1m, 2.5 x 1.5m, 3 x 1.5m) inspired by methodologies from the field of natural language processing (NLP). The title of each panel brings to mind the data-information-knowledge model in information science. A description of each panel follows below.

Author: Ludivine Lechat + code written by Tom De Smedt (2010).
Commissioned by CLiPS, with thanks to prof. Walter Daelemans.

 


Panel 1 – "scanning"

In this panel, raw data is depicted artistically as an unorganized jumble of cell-like elements (left corner). Compared to these data cells, the central flower-shape looks intricate and elegant. If the cells represent raw data, then the flower surely  represents a well-crafted and complex algorithm. It appears to be scanning the data, and passing the cells on one by one into a "flow" at the right end of the panel.

This might visualize what in NLP is called lexical analysis, where a sequence of characters is transformed into a sequence of tokens. For example, words and sentence periods are detected in a string of characters.

This panel is near the office entrance, visitors are initially confronted with the raw data cells.

 

 


Panel 2 – "parsing"

In this panel there is again a contrast between the simple data cells and a cluster of more complex shapes, which seem to be grinding down on the cells. An intense interaction between both is taking place. Notice how the purple data cell is blocked out; it could represent noise in the data set. Blue cells are then separated from green cells and organized in a loose grid. Cells forming structures.

This might visualize what in NLP is called shallow parsing, where a sequence of tokens is annotated with grammatical information. For example: nouns are distinguished from verbs. Data becomes information.

This panel is in the meeting room, where we formulate ideas into concrete projects. This is also reflected in the panel.

 

 


Panel 3 – "understanding"

In the last panel, organisms that resemble jellyfish appear to be at work diligently. We already encountered one of these in panel 2, blocking out the purple cell. They could represent programming scripts, statistical functions, regular expressions – things that occur frequently in NLP. Finally, the "flow" ends, and the cells seem to have evolved into large flowers, equally elaborate as the scanner or the parser in the previous panels.

These flowers essentially represent understanding: units of human knowledge computationally extracted from raw input data (i.e. language). The "machine" depicted across the panels is learning and creating its own organic constructs.

This panel is in the main office room, where all the research work is done.