explanation-based learning of data oriented parsing

August 20, 1997

Khalil University of Utrecht Sima'an

Computer Science

Computation and Language

This paper presents a new view of Explanation-Based Learning (EBL) of natural language parsing. Rather than employing EBL for specializing parsers by inferring new ones, this paper suggests employing EBL for learning how to reduce ambiguity only partially. The present method consists of an EBL algorithm for learning partial-parsers, and a parsing algorithm which combines partial-parsers with existing ``full-parsers". The learned partial-parsers, implementable as Cascades of Finite State Transducers (CFSTs), recognize and combine constituents efficiently, prohibiting spurious overgeneration. The parsing algorithm combines a learned partial-parser with a given full-parser such that the role of the full-parser is limited to combining the constituents, recognized by the partial-parser, and to recognizing unrecognized portions of the input sentence. Besides the reduction of the parse-space prior to disambiguation, the present method provides a way for refining existing disambiguation models that learn stochastic grammars from tree-banks. We exhibit encouraging empirical results using a pilot implementation: parse-space is reduced substantially with minimal loss of coverage. The speedup gain for disambiguation models is exemplified by experiments with the DOP model.

Learning Efficient Disambiguation

June 2, 1999

90% Match

Khalil Sima'an

Computation and Language

Artificial Intelligence

This dissertation analyses the computational properties of current performance-models of natural language parsing, in particular Data Oriented Parsing (DOP), points out some of their major shortcomings and suggests suitable solutions. It provides proofs that various problems of probabilistic disambiguation are NP-Complete under instances of these performance-models, and it argues that none of these models accounts for attractive efficiency properties of human language process...

Find SimilarView on arXiv

Data-Oriented Language Processing. An Overview

November 14, 1996

90% Match

Rens University of Amsterdam Bod, Remko University of Amsterdam Scha

Computation and Language

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will c...

Find SimilarView on arXiv

Some Novel Applications of Explanation-Based Learning to Parsing Lexicalized Tree-Adjoining Grammars

May 10, 1995

89% Match

B. Department of Computer and Information Science, University of Pennsylvania Srinivas, Aravind Department of Computer and Information Science, University of Pennsylvania Joshi

Computation and Language

In this paper we present some novel applications of Explanation-Based Learning (EBL) technique to parsing Lexicalized Tree-Adjoining grammars. The novel aspects are (a) immediate generalization of parses in the training set, (b) generalization over recursive structures and (c) representation of generalized parses as Finite State Transducers. A highly impoverished parser called a ``stapler'' has also been introduced. We present experimental results using EBL for different corp...

Find SimilarView on arXiv

Fast Parsing using Pruning and Grammar Specialization

April 26, 1996

89% Match

Manny SRI International, Cambridge Rayner, David SRI International, Cambridge Carter

Computation and Language

We show how a general grammar may be automatically adapted for fast parsing of utterances from a specific domain by means of constituent pruning and grammar specialization based on explanation-based learning. These methods together give an order of magnitude increase in speed, and the coverage loss entailed by grammar specialization is reduced to approximately half that reported in previous work. Experiments described here suggest that the loss of coverage has been reduced to...

Find SimilarView on arXiv

Two Questions about Data-Oriented Parsing

June 17, 1996

89% Match

Rens University of Amsterdam Bod

Computation and Language

In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left, however, two important questions unanswered: (1) how does DOP perform if tested on unedited data, and (2) how can DOP be used for parsing word strings that contain unknown words? This paper addresses these questions. We show that parse results ...

Find SimilarView on arXiv

Explanation-based Learning for Machine Translation

July 6, 1999

88% Match

Janine Toole, Fred Popowich, Devlan Nicholson, ... , McFetridge Paul

Computation and Language

In this paper we present an application of explanation-based learning (EBL) in the parsing module of a real-time English-Spanish machine translation system designed to translate closed captions. We discuss the efficiency/coverage trade-offs available in EBL and introduce the techniques we use to increase coverage while maintaining a high level of space and time efficiency. Our performance results indicate that this approach is effective.

Find SimilarView on arXiv

Natural Language Parsing as Statistical Pattern Recognition

May 3, 1994

88% Match

David M. Magerman

Computation and Language

Traditional natural language parsers are based on rewrite rule systems developed in an arduous, time-consuming manner by grammarians. A majority of the grammarian's efforts are devoted to the disambiguation process, first hypothesizing rules which dictate constituent categories and relationships among words in ambiguous sentences, and then seeking exceptions and corrections to these rules. In this work, I propose an automatic method for acquiring a statistical parser from a...

Find SimilarView on arXiv

A Data-Oriented Approach to Semantic Interpretation

June 18, 1996

87% Match

Rens University of Amsterdam Bod, Remko University of Amsterdam Bonnema, Remko University of Amsterdam Scha

Computation and Language

In Data-Oriented Parsing (DOP), an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new input sentence is constructed by combining sub-analyses from the corpus in the most probable way. This approach has been succesfully used for syntactic analysis, using corpora with syntactic annotations such as the Penn Treebank. If a corpus with semantically annotated sentences is used, the same approach can also generate the most probable semanti...

Find SimilarView on arXiv

Applying Explanation-based Learning to Control and Speeding-up Natural Language Generation

December 8, 1997

87% Match

Guenter Neumann

Computation and Language

This paper presents a method for the automatic extraction of subgrammars to control and speeding-up natural language generation NLG. The method is based on explanation-based learning (EBL). The main advantage for the proposed new method for NLG is that the complexity of the grammatical decision making process during NLG can be vastly reduced, because the EBL method supports the adaption of a NLG system to a particular use of a language.

Find SimilarView on arXiv

Efficient Algorithms for Parsing the DOP Model

April 22, 1996

86% Match

Joshua Harvard University Goodman

Computation and Language

Excellent results have been reported for Data-Oriented Parsing (DOP) of natural language texts (Bod, 1993). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo parsing algorithm. In this paper we solve the first problem by a novel reduction of the DOP model to a small, equivalent probabilistic c...

Find SimilarView on arXiv