The word sense induction and disambiguation task consisted of three separate phases:
- In the training phase, evaluation task participants were asked to use a traning dataset to induce the sense inventories for a set of polysemous words. The training dataset consisting of a set of polysemous nouns/verbs and the sentnece instances that they occurred in. No other resources were allowed other than morphological and syntactic Natural Language Processing components, such as morpohological analyzers, Part-Of-Speech taggers and syntactic parsers.
- In the testing phase, participants were provided with a test set for the disambiguating subtask using the induced sense inventory from the training phase.
- In the evaluation phase, answers of to the testing phase were evaluated in a supervised an unsupervised framework.
The unsupervised evaluation for WSI considered two types of evaluation V Measure (Rosenberg and Hirschberg, 2007), and paired F-Score (Artiles et al., 2009). This evaluation follows the supervised evaluation of SemEval-2007 WSI task (Agirre and Soroa, 2007)
Word Sense Induction and Disambiguation Example
editOften in the induction process, stop words are considered to be semantically irrelevant and hence not considered in the process of building the sense inventory. The induction process outputs clusters of candidate senses that are related to a certain latent semantic variable or sense cluster. Note that these sets of candidate senses should not be regarded as lexicographic meaning distinction (like synsets in WordNet or BabelNet). Rather, it should be regarded as a more coarse-grained and topic-related entity[1].
Target word: chip Occurs in the contexts[2]: "An N.V. Philipsunithascreatedacomputer systemthatprocesses video images3,000 times faster thanconventional systems." "Usingreduced instruction - set computing,or RISC,chips madeby Intergraph ofHuntsville, Ala., thesystem splitstheimageit‘sees’into 20digital representations,eachprocessedbyone chip."
Induced senses {Centroid:: Candidate senses}: {computer:: cache, CPU, memory, microprocessor, processor, RAM, register}
Disambiguation of the target word in context (a.k.a. coarse-grained sense):
{computer}
See also
editReferences
edit- ^ Tim Van de Cruys and Marianna Apidianaki. 2011. Latent semantic word sense induction and disambiguation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT). pp. 1476– 1485. Portland, Oregon, USA.
- ^ Note: strikethrough words in the contexts are not considered in the induction process. They are considered as Stop_words.
Category:Computational linguistics Category:Natural language processing Category:Semantics