GermaNet vs. Other Wordnet Alternatives

GermaNet vs. Other Wordnet Alternatives

1. Introduction

A Wordnet is a lexical database of semantic relations between words that links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into synsets with short definitions and usage examples. It can be seen as a combination and extension of a dictionary and thesaurus. It was first created in English (Princeton Wordnet) and software tools are available to access and use Wordnets. The Princeton Wordnet has a total of around 120 000 synsets. GermaNet is a manually constructed WordNet for German with approximately 170 000 synsets. OdeNet is an open source German Wordnet with about 36 000 synsets. In addition, we also constructed a context-based machine translated German Wordnet from the English Princeton Wordnet which we will refer to as the "Inferred German Wordnet". This Wordnet also contains 120 000 synsets, same as the Princeton Wordnet.

2. Comparative Overview

GermaNet is a high quality German WordNet, constructed by hand. It is bound by a licensing fee which has to be paid depending on how it will be used. Typically, licensing fees for academic usage are far less than those for commercial use. OdeNet and the Inferred German Wordnet are open source and free to use, but they were constructed programmatically and consequently did not undergo the same rigorous quality checks as GermaNet.

GermaNet is more complete than the other two wordnets, as it has approximately 170 000 synsets compared to the 36 000 and 120 000 of the other two.

A large part of the vocabulary in German are compound words (e.g. Haus -> Bauhaus, Puppenhaus), and in this regard GermaNet has a huge advantage compared to the other wordnets. Though OdeNet was also constructed from scratch and has some compound words, it only has 36 000 synsets compared to the 170 000 in Germanet. The Inferred German Wordnet doesn't have so many compound words, since it was constructed from a context-based machine translation of the Princeton Wordnet. German also has subject specific vocabulary such as "Vormieter" and "Nachmieter" which are used specifically in the context of renting real estate. GermanNet contains much more of this subject specific German vocabulary than the other two wordnets.

One disadvantage of GermaNet is that there does not seem to be any adverbs. OdeNet also don't have that many adverbs. On the other hand, the Inferred German Wordnet contains the same amount of adverbs as the Princeton Wordnet, which is quite a lot.

The adjectives in Germanet has hypernym and hyponym links to other synsets, while the adjectives in the Inferred German Wordnet and OdeNet don't have these relations to other synsets.

The Interlingual Indicator (ILI) is an idea that was introduced to enable the connection of synsets to each other in different languages. For example, if an ILI is assigned to synsets in wordnets for two different languages, then it means that these two synsets convey more or less the same meaning. In this regard, the Inferred German Wordnet does the best, because there is an existing link to all synsets in the Princeton Wordnet. GermaNet has a limited amount of ILIs and not all of them are correct. OdeNet has more ILIs, but also with some mistakes. The challenges that GermaNet and OdeNet have regarding correct ILI assignment stem from the fact that these wordnets were initially constructed from scratch for German only, and an attempt was only made to find suitable ILI matches in other languages at a later stage.

3. Summary

GermaNet

OdeNet

Inferred German Wordnet

GermaNet

OdeNet

Inferred German Wordnet

Not free to use (complicated licensing cost structure)

Free to use - open source

Free to use - open source

170 000 synsets

36 000 synsets

120 000 synsets

Very high quality, comprehensive wordnet, constructed by hand

Good wordnet, with limited amount of synsets and some mistakes (especially concerning relational connections between synsets) - constructed automatically; from scratch, using open source German linguistic sources

Very good wordnet, with some translation mistakes (especially concerning idiomatic language usage) - constructed automatically via context-based machine translation

The most coverage for compound German words

Some coverage for compound German words

Limited coverage for compound German words

Very good coverage for subject specific vocabulary

Some coverage for subject specfic vocabulary

Limited coverage for subject specific vocabulary

No adverbs

Limited amount of adverbs

Most adverbs (same as Princteon Wordnet)

Adjectives have links to other synsets

Adjectives have no links to other synsets

Adjectives have no links to other synsets

Very limited connection to similar synsets in other languages via the ILI, and with some matching mistakes

Good connection to synsets in other languages via the ILI, but with some mistakes (not all synsets are matched correctly)

Excellent connection to synsets in other languages via the ILI (all synsets have a ILI link to synsets in Princeton Wordnet) and matches are correct

4. Conclusion

GermaNet offers some clear advantages over the other wordnets in terms of quality and quantity. The major drawback it has is the uncertainty relating to licensing costs. The Inferred German Wordnet is a good quality, usable wordnet, with some limitations regarding richness of vocabulary. OdeNet is a relatively good open source wordnet, but with more limitations than the Inferred German Wordnet.

Recommendation: Since the inferred German Wordnet is usable and open source, it can be used to build a prototype to illustrate how wordnets can be useful for WLO. It does not make sense to invest in buying a version of GermaNet, before being sure that it will be used extensively in WLO.