pattern.de

The pattern.de module contains a fast, regular expressions-based tagger/chunker for German (identifies nouns, adjectives, verbs, etc. in a sentence) and tools for German verb conjugation and noun singularization & pluralization.

It can be used by itself or with other pattern modules: web | db | en | search | vector | graph.


Documentation

The functions in this module take the same parameters and return the same values as their counterparts in pattern.en. Refer to the documentation there for more details.  

Noun singularization & pluralization

For German nouns there is singularize() and pluralize(). The implementation uses a statistical approach with 84% accuracy for singularization and 72% for pluralization.

>>> from pattern.de import singularize, pluralize
>>> print singularize('Katzen')
>>> print pluralize('Katze')

Katze
Katzen 

Verb conjugation

For German verbs there is conjugate(), lemma(), lexeme() and tenses(). The lexicon for verb conjugation contains about 2,000 common German verbs; otherwise it will fall back to a rule-based approach with an accuracy of about 87%.

>>> from pattern.de import conjugate, INFINITIVE
>>> print conjugate('war', tense=INFINITIVE)

sein 

German verbs have more tenses than English verbs. In particular, the plural differs for each person:

Tense Alias Example
INFINITVE "inf" sein
PRESENT_1ST_PERSON_SINGULAR "1sg" ich bin
PRESENT_2ND_PERSON_SINGULAR "2sg" du bist
PRESENT_3RD_PERSON_SINGULAR "3sg" er ist
PRESENT_1ST_PERSON_PLURAL "1pl" wir sind
PRESENT_2ND_PERSON_PLURAL "2pl" ihr seid
PRESENT_3RD_PERSON_PLURAL "3pl" sie sind
PRESENT_PARTICIPLE "part" seiend
PAST_1ST_PERSON_SINGULAR "1sgp" ich war
PAST_2ND_PERSON_SINGULAR "2sgp" du warst
PAST_3RD_PERSON_SINGULAR "3sgp" er war
PAST_1ST_PERSON_PLURAL "1ppl" wir waren
PAST_2ND_PERSON_PLURAL "2ppl" ihr wart
PAST_3RD_PERSON_PLURAL "3ppl" sie waren
PAST_PARTICIPLE "ppart" gewesen

Additionally, there a three moods: imperative, present subjunctive and past subjunctive:

Mood Alias Example
IMPERATIVE_2ND_PERSON_SINGULAR "2sg!" sei
IMPERATIVE_2ND_PERSON_PLURAL "2pl!" seid
PRESENT_SUBJUNCTIVE_1ST_PERSON_SINGULAR "1sg?" ich sei
PRESENT_SUBJUNCTIVE_2ND_PERSON_SINGULAR "2sg?" du seiest
PRESENT_SUBJUNCTIVE_3RD_PERSON_SINGULAR "3sg?" er sei
PRESENT_SUBJUNCTIVE_1ST_PERSON_PLURAL "1pl?" wir seien
PRESENT_SUBJUNCTIVE_2ND_PERSON_PLURAL "2pl?" ihr seiet
PRESENT_SUBJUNCTIVE_3RD_PERSON_PLURAL "3pl?" sie seien
PAST_SUBJUNCTIVE_1ST_PERSON_SINGULAR "1sgp?" ich wäre
PAST_SUBJUNCTIVE_2ND_PERSON_SINGULAR "2sgp?" du wärest
PAST_SUBJUNCTIVE_3RD_PERSON_SINGULAR "3sgp?" er wäre
PAST_SUBJUNCTIVE_1ST_PERSON_PLURAL "1ppl?" wir wären
PAST_SUBJUNCTIVE_2ND_PERSON_PLURAL "2ppl?" ihr wäret
PAST_SUBJUNCTIVE_3RD_PERSON_PLURAL "3ppl?" sie wären

Attributive & predicative adjectives 

German adjectives inflect with an -e-em , -en, -er, or -es suffix (e.g., neugierig → die neugierige Katze) depending on gender and role. You can get the base form with the predicative() command, or vice versa with attributive().

For predicative, a statistical approach is used with an accuracy of 98%. For attributive, you need to supply gender (MALE, FEMALE, NEUTRAL) and role (SUBJECT, OBJECT, INDIRECT, PROPERTY) as parameters. The gender() function can be used to guess the gender of a given noun, with about 75% accuracy.

>>> from pattern.de import attributive, predicative
>>> from pattern.de import MALE, FEMALE, SUBJECT, OBJECT 
>>> print predicative('neugierige') 
>>> print attributive('neugierig', gender=FEMALE)
>>> print attributive('neugierig', gender=FEMALE, role=OBJECT)
>>> print attributive('neugierig', gender=FEMALE, role=INDIRECT, article="die")

neugierig
neugierige 
neugierige 
neugierigen 

Parser

For parsing there is parse() and split(). Words processed with parse() are assigned tags such as NN (nouns) or VB (verbs). See the pattern.en documentation (here) how to manipulate Sentence objects returned from split()

>>> from pattern.de import parse, split
>>> s = parse('Die Katze liegt auf der Matte.')
>>> s = split(s)
>>> print s.sentences[0]

Sentence('Die/DT/B-NP/O Katze/NN/I-NP/O liegt/VB/B-VP/O'
         'auf/IN/B-PP/B-PNP der/DT/B-NP/I-PNP Matte/NN/I-NP/I-PNP ././O/O')

The parser is built on Gerold Schneider & Martin Volk's German language model. The accuracy is reported around 95% (for 15% unknown words), but the score for the implementation in Pattern can vary slightly, since the original STTS tagset is mapped to Penn Treebank tagset. If you need to work with the original tags you can also use parse() with an optional parameter tagset="STTS".

Reference: Schneider, G., Volk, M. (1998). Adding manual constraints and lexical look-up to a Brill-tagger for German. Proceedings of ESSLLI-98. 

Sentiment analysis

There's no sentiment() function for German yet.