ID: cmp-lg/9708012

Encoding Frequency Information in Lexicalized Grammars

August 19, 1997

View on ArXiv
John University of Sussex Carroll, David University of Sussex Weir
Computer Science
Computation and Language

We address the issue of how to associate frequency information with lexicalized grammar formalisms, using Lexicalized Tree Adjoining Grammar as a representative framework. We consider systematically a number of alternative probabilistic frameworks, evaluating their adequacy from both a theoretical and empirical perspective using data from existing large treebanks. We also propose three orthogonal approaches for backing off probability estimates to cope with the large number of parameters involved.

Similar papers 1

Lexicalization and Grammar Development

October 21, 1994

90% Match
B. University of Pennsylvania Srinivas, Dania University of Pennsylvania Egedi, ... , Becker Tilman University of Pennsylvania
Computation and Language

In this paper we present a fully lexicalized grammar formalism as a particularly attractive framework for the specification of natural language grammars. We discuss in detail Feature-based, Lexicalized Tree Adjoining Grammars (FB-LTAGs), a representative of the class of lexicalized grammars. We illustrate the advantages of lexicalized grammars in various contexts of natural language processing, ranging from wide-coverage grammar development to parsing and machine translation....

Find SimilarView on arXiv

Can Subcategorisation Probabilities Help a Statistical Parser?

June 21, 1998

88% Match
John University of Sussex Carroll, Guido University of Sussex Minnen, Ted Cambridge University Briscoe
Computation and Language

Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a wide-coverage statist...

Find SimilarView on arXiv

Disambiguation of Super Parts of Speech (or Supertags): Almost Parsing

October 26, 1994

88% Match
Aravind K. University of Pennsylvania Joshi, B. University of Pennsylvania Srinivas
Computation and Language

In a lexicalized grammar formalism such as Lexicalized Tree-Adjoining Grammar (LTAG), each lexical item is associated with at least one elementary structure (supertag) that localizes syntactic and semantic dependencies. Thus a parser for a lexicalized grammar must search a large set of supertags to choose the right ones to combine for the parse of the sentence. We present techniques for disambiguating supertags using local information such as lexical preference and local lexi...

Find SimilarView on arXiv

Encoding Lexicalized Tree Adjoining Grammars with a Nonmonotonic Inheritance Hierarchy

May 15, 1995

87% Match
Roger University of Brighton Evans, Gerald University of Sussex Gazdar, David University of Sussex Weir
Computation and Language

This paper shows how DATR, a widely used formal language for lexical knowledge representation, can be used to define an LTAG lexicon as an inheritance hierarchy with internal lexical rules. A bottom-up featural encoding is used for LTAG trees and this allows lexical rules to be implemented as covariation constraints within feature structures. Such an approach eliminates the considerable redundancy otherwise associated with an LTAG lexicon.

Find SimilarView on arXiv

An Empirical Evaluation of Probabilistic Lexicalized Tree Insertion Grammars

August 4, 1998

87% Match
Rebecca Harvard University Hwa
Computation and Language

We present an empirical study of the applicability of Probabilistic Lexicalized Tree Insertion Grammars (PLTIG), a lexicalized counterpart to Probabilistic Context-Free Grammars (PCFG), to problems in stochastic natural-language processing. Comparing the performance of PLTIGs with non-hierarchical N-gram models and PCFGs, we show that PLTIG combines the best aspects of both, with language modeling capability comparable to N-grams, and improved parsing performance over its non...

Find SimilarView on arXiv

Prefix Probabilities from Stochastic Tree Adjoining Grammars

September 18, 1998

86% Match
Mark-Jan DFKI Nederhof, Anoop UPenn Sarkar, Giorgio UPadova Satta
Computation and Language

Language models for speech recognition typically use a probability model of the form Pr(a_n | a_1, a_2, ..., a_{n-1}). Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the above form is constructed from such grammars by computing the prefix probability Sum_{w in Sigma*} Pr(a_1 ... a_n w), where w represents all possible terminations of the prefix a_1 ... a_n. The main result in this paper is an algorithm to comp...

Find SimilarView on arXiv

Data-Oriented Language Processing. An Overview

November 14, 1996

86% Match
Rens University of Amsterdam Bod, Remko University of Amsterdam Scha
Computation and Language

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will c...

Find SimilarView on arXiv

Learning Computational Grammars

July 15, 2001

86% Match
John Nerbonne, Anja Belz, Nicola Cancedda, Herve Dejean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, ... , Sang Erik F. Tjong Kim
Computation and Language

This paper reports on the "Learning Computational Grammars" (LCG) project, a postdoc network devoted to studying the application of machine learning techniques to grammars suitable for computational use. We were interested in a more systematic survey to understand the relevance of many factors to the success of learning, esp. the availability of annotated data, the kind of dependencies in the data, and the availability of knowledge bases (grammars). We focused on syntax, esp....

Find SimilarView on arXiv

Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training

August 30, 2000

86% Match
Stefan Riezler, Detlef Prescher, ... , Johnson Mark
Computation and Language

We present a new approach to stochastic modeling of constraint-based grammars that is based on log-linear models and uses EM for estimation from unannotated data. The techniques are applied to an LFG grammar for German. Evaluation on an exact match task yields 86% precision for an ambiguity rate of 5.4, and 90% precision on a subcat frame match for an ambiguity rate of 25. Experimental comparison to training from a parsebank shows a 10% gain from EM training. Also, a new clas...

Find SimilarView on arXiv

An Alternative Conception of Tree-Adjoining Derivation

April 4, 1994

86% Match
Yves Schabes, Stuart M. Shieber
Computation and Language

The precise formulation of derivation for tree-adjoining grammars has important ramifications for a wide variety of uses of the formalism, from syntactic analysis to semantic interpretation and statistical language modeling. We argue that the definition of tree-adjoining derivation must be reformulated in order to manifest the proper linguistic dependencies in derivations. The particular proposal is both precisely characterizable through a definition of TAG derivations as equ...

Find SimilarView on arXiv