Chapter 6 describes DKPro WSD, our software framework supporting the rapid develop- ment and testing of sense disambiguation systems, and Chap- ter 7 covers GLASS, our sense-annotated German-language data set for word sense disambiguation and lexical substitution.

Though subsequent chapters provide background sections of their own, the topics they cover are specific to those chapters. The aggregate comparisons are then expressed in terms of evaluation metrics adapted from the field of information retrieval.

Both the original and simplified Lesk algorithms are susceptible to the aforementioned lexical gap problem. First, we enriched the glosses of Word- Net senses with those from their aligned Wiktionary and Wikipedia senses. The distributional thesauri DTs produced by these automated tech- niques have been criticized for their inability to distinguish between synonyms, antonyms, hypernyms, etc.

Intrinsic evaluations are popular because the frameworks are easy to define and implement Palmer, Ng, et al. However, none of these approaches fully solve the knowledge acquisition bottleneck; they can only re- duce the amount of required training data or the time required to collect it.

It is therefore surprising that they have attracted very little attention in the fields of computational linguistics and natural language processing.

Despite the considerable research attention WSD has received, pro- ducing disambiguation systems which are both practical and highly accurate has otpimierung an elusive open problem.

However, for the general case, we make no assumptions about the method that generates the lexical expansions, which could just as easily come from, say, translations via bridge languages, paraphrasing systems, or lexical substitution systems. Despite its simplicity, the original Lesk algorithm performs sur- prisingly well, and for this reason has spawned a number of variants and extensions, including one of our own. Depending on the method used, these knowledge sources can include machine- readable versions of traditional dictionaries and thesauri, semantic 2 There are some who have even questioned the very existence of word senses, at least as traditionally understood e.

We gave high- level overview of approaches to WSD, including supervised methods, which require a collection of hand-annotated training examples, and knowledge-based techniques, which do not. Yes Default Voice Playback: Rather, they build a graph representing the context and use it to disambiguate all its words simultaneously.

In semi-supervised and minimally supervised WSD, the amount of training data is increased by automatically acquiring new annotations from untagged corpora. Words are thereby represented by the concatenation of the surface form and the POS as assigned by the parser. Chapter 6 describes DKPro WSD, our software framework supporting the rapid develop- ment and testing of sense disambiguation systems, and Chap- ter 7 covers GLASS, our sense-annotated German-language data set for word sense disambiguation and lexical substitution.

While isolated studies have demonstrated the usefulness optimietung WSD in machine translation e. Publikationen Its fundamental advantage over traditional dictionaries and thesauri is its basic unit of organization, the synset, a set of word forms expressing the same meaning.

Our experiments with the simplified Lesk algorithm use only the definitions provided by WordNet; they are intended to model the case where we have a generic MRD which provides sense definitions, but no additional lexical-semantic information such as example sentences or semantic relations. For this data set anjo, we make a slight modification to our algorithm to account for this clustering:

The move brought some controversy, with long standing members of the forum leaving due to the new forums cramped spacing, advertising. Contextual clues may be identified by means of standard natural language processing software, including segmenters, tokenizers, stemmers, lemmatizers, part-of-speech tag- gers, and syntactic parsers; knowledge sources consulted can range from ooptimierung dictionaries to language models automatically con- structed from raw or manually annotated text corpora.

In this section, we briefly survey the past tasks and data sets relevant to our work. At the end of this process, a more traditional gloss similarity—based approach is used to ano to align any re- maining unaligned senses. Cluster evaluations are appropriate if constructing the alignment is simply a means optimiering decreasing the granularity of a single sense in- ventory. Our choice to use sentential context was motivated by simplicity and expediency; a more refined WSD algorithm could, of course, use a sliding or dynamically sized context window and thereby avoid this problem.

The evaluation frameworks remain popular for assessing novel WSD sys- tems. Though subsequent chapters provide background sections of their own, the topics they cover optimieruny specific to those chapters. The main differences to vector-space approaches are the following:

To reuse part of the previous example, consider an annotation study using the coarse-grained dictionary with only two senses for the lemma. Examples of popular pre- built collocation resources include the Web 1t corpus Brants and Franz, and the concordances distributed with the aforecited BNC and WaCKy corpora. To gain some insight as to why, or at least when, this is the case, we compared the instances incorrectly disambiguated when using the standard glosses but not when using the enriched glosses against the instances incorrectly disambiguated when using the enriched glosses but not when using the standard glosses.

