ID: cmp-lg/9708012

Encoding Frequency Information in Lexicalized Grammars

August 19, 1997

View on ArXiv

Similar papers 3

Attaching Multiple Prepositional Phrases: Generalized Backed-off Estimation

October 16, 1997

85% Match
Paola U. of Pennsylvania and University of Geneva Merlo, Matthew University of Edinburgh Crocker, Cathy University of Geneva Berthouzoz
Computation and Language

There has recently been considerable interest in the use of lexically-based statistical techniques to resolve prepositional phrase attachments. To our knowledge, however, these investigations have only considered the problem of attaching the first PP, i.e., in a [V NP PP] configuration. In this paper, we consider one technique which has been successfully applied to this problem, backed-off estimation, and demonstrate how it can be extended to deal with the problem of multiple...

Find SimilarView on arXiv

Probabilistic Parsing Using Left Corner Language Models

November 17, 1997

85% Match
Christopher D. University of Sydney Manning, Bob Lucent Technologies Bell Labs Carpenter
Computation and Language

We introduce a novel parser based on a probabilistic version of a left-corner parser. The left-corner strategy is attractive because rule probabilities can be conditioned on both top-down goals and bottom-up derivations. We develop the underlying theory and explain how a grammar can be induced from analyzed data. We show that the left-corner approach provides an advantage over simple top-down probabilistic context-free grammars in parsing the Wall Street Journal using a gramm...

Find SimilarView on arXiv

Unified Likelihood Ratio Estimation for High- to Zero-frequency N-grams

October 3, 2021

85% Match
Masato Kikuchi, Kento Kawakami, Kazuho Watanabe, ... , Umemura Kyoji
Computation and Language

Likelihood ratios (LRs), which are commonly used for probabilistic data processing, are often estimated based on the frequency counts of individual elements obtained from samples. In natural language processing, an element can be a continuous sequence of $N$ items, called an $N$-gram, in which each item is a word, letter, etc. In this paper, we attempt to estimate LRs based on $N$-gram frequency information. A naive estimation approach that uses only $N$-gram frequencies is s...

Find SimilarView on arXiv

A Flexible POS tagger Using an Automatically Acquired Language Model

July 11, 1997

85% Match
Lluis Marquez, Lluis Padro
Computation and Language

We present an algorithm that automatically learns context constraints using statistical decision trees. We then use the acquired constraints in a flexible POS tagger. The tagger is able to use information of any degree: n-grams, automatically learned context constraints, linguistically motivated manually written constraints, etc. The sources and kinds of constraints are unrestricted, and the language model can be easily extended, improving the results. The tagger has been tes...

Find SimilarView on arXiv

Bayesian Grammar Induction for Language Modeling

May 1, 1995

85% Match
Stanley F. Harvard University Chen
Computation and Language

We describe a corpus-based induction algorithm for probabilistic context-free grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a post-pass using the Inside-Outside algorithm. We compare the performance of our algorithm to n-gram models and the Inside-Outside algorithm in three language modeling tasks. In two of the tasks, the training data is generated by a probabilistic context-free grammar and in both tasks our algorithm outperforms...

Find SimilarView on arXiv

Learning Language from a Large (Unannotated) Corpus

January 14, 2014

85% Match
Linas Vepstas, Ben Goertzel
Computation and Language
Machine Learning

A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to ...

Find SimilarView on arXiv

A State-Transition Grammar for Data-Oriented Parsing

February 27, 1995

85% Match
David University of Edinburgh Tugwell
Computation and Language

This paper presents a grammar formalism designed for use in data-oriented approaches to language processing. The formalism is best described as a right-linear indexed grammar extended in linguistically interesting ways. The paper goes on to investigate how a corpus pre-parsed with this formalism may be processed to provide a probabilistic language model for use in the parsing of fresh texts.

Find SimilarView on arXiv

Precise n-gram Probabilities from Stochastic Context-free Grammars

May 10, 1994

85% Match
Andreas ICSI, Berkeley, CA Stolcke, Jonathan ICSI, Berkeley, CA Segal
Computation and Language

We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic structure, among others). The method operates via the computation of substring expectations, which in turn is accomplished by solving systems of linear equations derived from the grammar. We discuss efficient implementation of the algorithm and rep...

Find SimilarView on arXiv

Some Novel Applications of Explanation-Based Learning to Parsing Lexicalized Tree-Adjoining Grammars

May 10, 1995

85% Match
B. Department of Computer and Information Science, University of Pennsylvania Srinivas, Aravind Department of Computer and Information Science, University of Pennsylvania Joshi
Computation and Language

In this paper we present some novel applications of Explanation-Based Learning (EBL) technique to parsing Lexicalized Tree-Adjoining grammars. The novel aspects are (a) immediate generalization of parses in the training set, (b) generalization over recursive structures and (c) representation of generalized parses as Finite State Transducers. A highly impoverished parser called a ``stapler'' has also been introduced. We present experimental results using EBL for different corp...

Find SimilarView on arXiv

A Freely Available Syntactic Lexicon for English

October 21, 1994

85% Match
Dania University of Pennsylvania Egedi, Patrick University of Pennsylvania Martin
Computation and Language

This paper presents a syntactic lexicon for English that was originally derived from the Oxford Advanced Learner's Dictionary and the Oxford Dictionary of Current Idiomatic English, and then modified and augmented by hand. There are more than 37,000 syntactic entries from all 8 parts of speech. An X-windows based tool is available for maintaining the lexicon and performing searches. C and Lisp hooks are also available so that the lexicon can be easily utilized by parsers and ...

Find SimilarView on arXiv