WordNet Protoype Application Analysis
1. Introduction
In this article we reflect on some of the insights gained from the implementation of a Wordnet Prototype Application (https://edu.yovisto.com/wordnet ). In two previous articles, the advantages of Wordnets were explained (Wordnets ), and the different Wordnets for the German language were analysed in a comparative fashion (GermaNet vs. Other Wordnet Alternatives ). In the latter article, three Wordnets for the German language were analysed, namely GermaNet, OdeNet, Inferred German Wordnet. The conclusion was that GermaNet demonstrates some clear advantages above the other two Wordnets, mainly as a result of superior content quality and quantity. Unlike the other two German Wordnets, GermaNet is not open source, and bound by licensing costs. Therefore, the recommendation was to build a prototype Wordnet application with one of the open source Wordnets to evaluate the practical usefulness thereof, before investing in GermaNet. In the following sections, we report on the results of the prototype implementation.
2. Browsability and visualisation
The prototype application illustrated that it was easy to browse for synsets (grouping on synonymous words with the same contextual meaning), and to show the meaning of the words in context. Browsing includes searching for synsets, navigating parent-child relationships and part-whole relationships.
Graph-based visualisations were used to show how a synset is related to other synsets. These visualisations are dynamic in the sense that they can be adjusted to display more information, e.g. more synonyms in a synset, or expansion of the graph to show more contextual information.
3. Embedding Wordnets inside free-text
The previous section described how we can browse, navigate and visualise Wordnets. Since Wordnets are represented in a graph-based structure, it gives us additional context when browsing because we can see how words are related to other words. Normal electronic dictionaries, such as electronic thesauri, are list-based and don't provide this advantage.
But we want to explore the usefulness of Wordnets even further, by looking at application possibilities inside software products. In the prototype application, we dynamically analysed text that was input by the user (either as free-text entered by the user or a website url). A SpaCy (spaCy · Industrial-strength Natural Language Processing in Python ) language model was used to identify parts of speech (POS) of the words inside the text. As a second step, a word-vector algorithm was used to link the words in the text to synsets inside the Wordnet graph. The algorithm identified the most suitable synset for a word within the given context. All possible synsets for a word (i.e. synsets with the same POS as the word in the text) were evaluated and assigned a weight based on the results of the implemented word-vector algorithm.
We propose that this same method can be used to analyse learning material in curricula. In this way, students will be able to gain deeper insight into text, by inspecting the meaning of words in more detail. The visualisation of words can be especially useful within curricula learning material. For example, technical terms such as 'Oxygen' can be visualised in a graph context, enabling the student to see other related terms and how they are connected to 'Oxygen.'
4. Wordnet usage in the world
Wordnets are primarily used in research contexts. Many papers are published yearly on Wordnet topics, with the Global Wordnet Organisation (http://globalwordnet.org/ ) as the leading role-player in this regard. They organise an international conference every two years, with publications focusing solely on Wordnet-related themes.
The few commercial applications that do use Wordnet, are almost all exclusively in English. There is definitely a gap in the market with regards to the usage of non-English Wordnets. This becomes evident when looking at BabelNet (https://babelnet.org/ ) - a very big European research project bringing together Knowledge Graph-based linguistic data. Though BabelNet gathers Knowledge Graph-based information for different languages, and from different sources, the Wordnet information is solely in English.
5. Conclusion
The Wordnet prototype application illustrated how useful Wordnets can be for providing a better understanding of text and technical terms by providing more context and visualisations. The ideas in the prototype can also be applied to learning material in the WLO context.
The prototype application was developed with the Inferred German Wordnet. We received positive feedback, and the possibility of utilizing the functionalities of the prototype in WLO also seems to be promising. Several external parties have also expressed interest in collaborating with us on Wordnet projects (especially within the context of curricula and language-learning). Therefore, the recommendation is to invest in the licensing costs of GermaNet, since it will provide us with quality and quantity improvements, and it is highly likely that it will be utilised in WLO, and potentially in projects with other WLO partners.