Test "Notes".[a]

Help:


Wiktionary edit

Wiktionary data are heavily used in various NLP tasks (see #Wiktionary data in NLP).

Wiktionary data in NLP edit

Wiktionary has semi-structured data.[2] Wiktionary lexicographic data should be converted to machine-readable format in order to be used in natural language processing tasks.[3][4][5]

Wiktionary data mining is a complex task. There are the following difficulties:[6] (1) the constant and frequent changes to data and schema, (2) the heterogeneity in Wiktionary language edition schemas [b] and (3) the human-centric nature of a wiki.

There are several parsers for different Wiktionary language editions[7]:

  • DBpedia Wiktionary — a subproject of DBpedia, the data are extracted from English, French, German and Russian wiktionaries; the data includes language, part of speech, definitions, semantic relations and translations. The declarative description of the page scema[8], regular expressions[9] and finite state transducer[10] are used in order to extract information.
  • JWKTL (Java Wiktionary Library) — provides access to English Wiktionary and German Wiktionary dumps via a Java API.[11] The data includes language, part of speech, definitions, quotations, semantic relations, etymologies and translations. JWKTL is available for non-commercial use.
  • wikokit — the parser of English Wiktionary and Russian Wiktionary[12]. The parsed data includes language, part of speech, definitions, quotations[13][c], semantic relations[14] and translations. This is a multi-licensed open-source software.

The various natural language processing tasks were solved with the help of Wiktionary data[15]:

Notes edit

  1. ^ Explanatory note example with reference.[1]
  2. ^ E.g. compare the entry structure and formatting rules in English Wiktionary and Russian Wiktionary.
  3. ^ Quotations are extracted only from Russian Wiktionary.[13]
  4. ^ If there are several IPA notations on a Wiktionary page – either for different languages or for pronunciation variants, then the first pronunciation was extracted.[19]
  5. ^ http://conceptnet5.media.mit.edu
  6. ^ The source code and the results of POS-tagging are available at https://code.google.com/p/wikily-supervised-pos-tagger

Citations edit

References edit

  • Chesley, Paula; Vincent, Bruce; Xu, Li; Srihari, Rohini K. (2006). "Using verbs and adjectives to automatically classify blog sentiment" (PDF). Training. 580: 233–235. Retrieved May 9, 2013.
  • Krizhanovsky, Andrew (2010). "Transformation of Wiktionary entry structure into tables and relations in a relational database schema". arXiv:1011.1368 [cs].
  • Krizhanovsky, Andrew (2010). "The comparison of Wiktionary thesauri transformed into the machine-readable format". arXiv:1006.5040 [cs].
  • Li, Shen; Graça, Joao V.; Taskar, Ben (2012). "Wiki-ly supervised part-of-speech tagging" (PDF). Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Jeju Island, Korea: Association for Computational Linguistics. pp. 1389–1398.
  • McFate, Clifton J.; Forbus, Kenneth D. (2011). "NULEX: an open-license broad coverage lexicon" (PDF). The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference. Portland, Oregon, USA: The Association for Computer Linguistics. pp. 363–367. ISBN 978-1-932432-88-6.