explanation-based learning of data oriented parsing

August 20, 1997

A State-Transition Grammar for Data-Oriented Parsing

February 27, 1995

86% Match

David University of Edinburgh Tugwell

Computation and Language

This paper presents a grammar formalism designed for use in data-oriented approaches to language processing. The formalism is best described as a right-linear indexed grammar extended in linguistically interesting ways. The paper goes on to investigate how a corpus pre-parsed with this formalism may be processed to provide a probabilistic language model for use in the parsing of fresh texts.

Find SimilarView on arXiv

A Probabilistic Approach to Lexical Semantic Knowledge Acquisition and S tructural Disambiguation

December 1, 1998

86% Match

Hang NEC Corporation LI

Computation and Language

In this thesis, I address the problem of automatically acquiring lexical semantic knowledge, especially that of case frame patterns, from large corpus data and using the acquired knowledge in structural disambiguation. The approach I adopt has the following characteristics: (1) dividing the problem into three subproblems: case slot generalization, case dependency learning, and word clustering (thesaurus construction). (2) viewing each subproblem as that of statistical estimat...

Find SimilarView on arXiv

Learning Unification-Based Natural Language Grammars

February 3, 1995

86% Match

Miles Dept. of Computer Science, University of York, York, England Osborne

Computation and Language

When parsing unrestricted language, wide-covering grammars often undergenerate. Undergeneration can be tackled either by sentence correction, or by grammar correction. This thesis concentrates upon automatic grammar correction (or machine learning of grammar) as a solution to the problem of undergeneration. Broadly speaking, grammar correction approaches can be classified as being either {\it data-driven}, or {\it model-based}. Data-driven learners use data-intensive methods ...

Find SimilarView on arXiv

Towards History-based Grammars: Using Richer Models for Probabilistic Parsing

May 3, 1994

86% Match

Ezra Black, Fred Jelinek, John Lafferty, David M. Magerman, ... , Roukos Salim

Computation and Language

We describe a generative probabilistic model of natural language, which we call HBG, that takes advantage of detailed linguistic information to resolve ambiguity. HBG incorporates lexical, syntactic, semantic, and structural information from the parse tree into the disambiguation process in a novel way. We use a corpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the...

Find SimilarView on arXiv

Parsing with the Shortest Derivation

September 27, 2000

86% Match

Rens Bod

Computation and Language

Common wisdom has it that the bias of stochastic grammars in favor of shorter derivations of a sentence is harmful and should be redressed. We show that the common wisdom is wrong for stochastic grammars that use elementary trees instead of context-free rules, such as Stochastic Tree-Substitution Grammars used by Data-Oriented Parsing models. For such grammars a non-probabilistic metric based on the shortest derivation outperforms a probabilistic metric on the ATIS and OVIS c...

Find SimilarView on arXiv

Combining semantic and syntactic structure for language modeling

October 24, 2001

86% Match

Rens Bod

Computation and Language

Structured language models for speech recognition have been shown to remedy the weaknesses of n-gram models. All current structured language models are, however, limited in that they do not take into account dependencies between non-headwords. We show that non-headword dependencies contribute to significantly improved word error rate, and that a data-oriented parsing model trained on semantically and syntactically annotated data can exploit these dependencies. This paper also...

Find SimilarView on arXiv

Aspects of Pattern-Matching in Data-Oriented Parsing

August 18, 2000

86% Match

Pauw Guy De

Computation and Language

Data-Oriented Parsing (dop) ranks among the best parsing schemes, pairing state-of-the art parsing accuracy to the psycholinguistic insight that larger chunks of syntactic structures are relevant grammatical and probabilistic units. Parsing with the dop-model, however, seems to involve a lot of CPU cycles and a considerable amount of double work, brought on by the concept of multiple derivations, which is necessary for probabilistic processing, but which is not convincingly r...

Find SimilarView on arXiv

Exploiting Diversity for Natural Language Parsing

June 5, 2000

86% Match

John C. Henderson

Computation and Language

The popularity of applying machine learning methods to computational linguistics problems has produced a large supply of trainable natural language processing systems. Most problems of interest have an array of off-the-shelf products or downloadable code implementing solutions using various techniques. Where these solutions are developed independently, it is observed that their errors tend to be independently distributed. This thesis is concerned with approaches for capitaliz...

Find SimilarView on arXiv

Exploiting Diversity in Natural Language Processing: Combining Parsers

June 1, 2000

85% Match

John C. Henderson, Eric Brill

Computation and Language

Three state-of-the-art statistical parsers are combined to produce more accurate parses, as well as new bounds on achievable Treebank parsing accuracy. Two general approaches are presented and two combination techniques are described for each approach. Both parametric and non-parametric models are explored. The resulting parsers surpass the best previously published performance results for the Penn Treebank.

Find SimilarView on arXiv

Three studies of grammar-based surface-syntactic parsing of unrestricted English text. A summary and orientation

June 27, 1994

85% Match

Atro Research Unit for Computational Linguistics, University of Helsinki Voutilainen

Computation and Language

The dissertation addresses the design of parsing grammars for automatic surface-syntactic analysis of unconstrained English text. It consists of a summary and three articles. {\it Morphological disambiguation} documents a grammar for morphological (or part-of-speech) disambiguation of English, done within the Constraint Grammar framework proposed by Fred Karlsson. The disambiguator seeks to discard those of the alternative morphological analyses proposed by the lexical analys...

Find SimilarView on arXiv