Download Ambiguity Resolution in Language Learning: Computational and by Hinrich Schütze PDF

By Hinrich Schütze

This quantity is worried with how ambiguity and ambiguity answer are realized, that's, with the purchase of different representations of ambiguous linguistic types and the data important for choosing between them in context. Schütze concentrates on how the purchase of ambiguity is feasible in precept and demonstrates that individual forms of algorithms and studying architectures (such as unsupervised clustering and neural networks) can be triumphant on the activity. 3 varieties of lexical ambiguity are handled: ambiguity in syntactic categorisation, semantic categorisation, and verbal subcategorisation. the quantity offers 3 diversified versions of ambiguity acquisition: Tag house, notice area, and Subcat Learner, and addresses the significance of ambiguity in linguistic illustration and its relevance for linguistic innateness.

Why should it be our goal to reproduce one of the many possibilities that was chosen by the collectors of the SYNTACTIC CATEGORIZATION / 41 Brown corpus? To put it differently, other human-made tag sets would also fare badly if they were evaluated for their ability to reproduce the Brown tag set. So an automatically generated tag set is not necessarily bad just because it does not reproduce the Brown tag set. For the reasons outlined above, the following evaluation procedure was chosen here: • Cluster tokens into 200 clusters.

1 million tokens and 47,025 word types. SYNTACTIC CATEGORIZATION / 35 The motivation for using the SVD is to address three problems, two of which were just discussed: • sparseness • generalization • compactness It is obvious how the last point is achieved: by reduction to a lowdimensional space the objects we need to deal with are smaller in size and we gain efficiency. SVD addresses the problems of sparseness and generalization because a high-dimensional space can represent more information than a low-dimensional space.

The rationale is that words with similar left context characterize words to their right in a similar way. For example, "seemed" and "would" SYNTACTIC CATEGORIZATION / 39 have similar left contexts, and they characterize the right contexts of "he" and "the firefighter" as potentially containing an inflected verb form. Rather than having separate entries in its right context vector for "seemed", "would", and "likes", a word like "he" can now be characterized by a generalized entry for "inflected verb form occurs frequently to my right".

