Document toolboxDocument toolbox

Yovisto Erschließung

Dokumente intern

https://docs.google.com/presentation/d/1UOz8-giWWcXGQtEiayIRVikyplmEUkwfzsXBiYAk5A8/edit#slide=id.gf818bbfad4_0_461
https://docs.google.com/presentation/d/15-EYp3rs1fbhcoI6oCDwkKSqpRdU0N4rx49xTsElW5Q/edit#slide=id.g19c6a5d778_0_45
https://docs.google.com/document/d/1bgFXx9c5m_A0G7V7EgsAcmmin0N6zcbQ4_vGH0h9IrE/edit

https://github.com/yovisto?tab=repositories

Topic Assistant

https://github.com/yovisto/wlo-topic-assistant

“A utility to map arbitrary text to the WLO/OEH topics vocabulary based on keyword matching.”

the only endpoint currently targeted in the Redaktionsumgebung

endpoint POST /topics

uses:

https://github.com/openeduhub/oeh-metadata-vocabs/blob/master/oehTopics.ttl

https://raw.githubusercontent.com/openeduhub/oeh-metadata-vocabs/master/discipline.ttl



Language Detection


https://github.com/RMeissnerCC/wlo-langdetect


Uses langdetect of python to detect language - has been used in MetaQS as well.


Classification


https://github.com/RMeissnerCC/wlo-classification


Predicts discipline of content

Returns integer, corresponding to discipline of content


Deduplication


https://github.com/yovisto/wlo-duplicate-detection


Purely detection of duplicates based on pre-trained text

uses MinHash algorithm


currently trainied for id, url and description of some data

  • age and state of data unclear


Recommender


https://github.com/yovisto/wlo-recommender



Returns: "a list of scores and document ids relevant to the query document. Only the top ten items are retrieved, in descending order"


Model pretrained, details unclear - same model as classification?


Service analyzer


https://github.com/yovisto/wlo-analysis-service


Analyzes the OER object itself and yields categories, e.g., which categories title, description and keywords could belong, too


Needs training, "Kreissektor" is connected to "Allgemeine Psychologie"


Metadata Mappings


https://github.com/yovisto/wlo-metadata-mappings


Verschiedene Mappings, e.g., OER Kategorien zu Wikipedia


some outdated links


kein Rest-Endpunkt o.ä.