Encoding Frequency Information in Lexica...

Status of the XTAG System

November 3, 1994

86% Match

Christy University of Pennsylvania Doran, Dania University of Pennsylvania Egedi, ... , Srinivas B. University of Pennsylvania

Computation and Language

XTAG is an ongoing project to develop a wide-coverage grammar for English, based on the Feature-based Lexicalized Tree Adjoining Grammar (FB-LTAG) formalism. The XTAG system integrates a morphological analyzer, an N-best part-of-speech tagger, an Early-style parser and an X-window interface, along with a wide-coverage grammar for English developed using the system. This system serves as a linguist's workbench for developing FB-LTAG specifications. This paper presents a descri...

Find SimilarView on arXiv

A New Statistical Parser Based on Bigram Lexical Dependencies

May 6, 1996

86% Match

Michael University of Pennsylvania Collins

Computation and Language

This paper describes a new statistical parser which is based on probabilities of dependencies between head-words in the parse tree. Standard bigram probability estimation techniques are extended to calculate probabilities of dependencies between pairs of words. Tests using Wall Street Journal data show that the method performs at least as well as SPATTER (Magerman 95, Jelinek et al 94), which has the best published results for a statistical parser on this task. The simplicity...

Find SimilarView on arXiv

XTAG system - A Wide Coverage Grammar for English

October 20, 1994

86% Match

Christy University of Pennsylvania Doran, Dania University of Pennsylvania Egedi, Beth Ann University of Pennsylvania Hockey, ... , Zaidel Martin University of Pennsylvania

Computation and Language

This paper presents the XTAG system, a grammar development tool based on the Tree Adjoining Grammar (TAG) formalism that includes a wide-coverage syntactic grammar for English. The various components of the system are discussed and preliminary evaluation results from the parsing of various corpora are given. Results from the comparison of XTAG against the IBM statistical parser and the Alvey Natural Language Tool parser are also given.

Find SimilarView on arXiv

Automatic Extraction of Subcategorization from Corpora

February 4, 1997

86% Match

Ted Cambridge University Briscoe, John University of Sussex Carroll

Computation and Language

We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted...

Find SimilarView on arXiv

Prepositional Phrase Attachment through a Backed-Off Model

June 22, 1995

85% Match

Michael University of Pennsylvania Collins, James University of Pennsylvania Brooks

Computation and Language

Recent work has considered corpus-based or statistical approaches to the problem of prepositional phrase attachment ambiguity. Typically, ambiguous verb phrases of the form {v np1 p np2} are resolved through a model which considers values of the four head words (v, n1, p and n2). This paper shows that the problem is analogous to n-gram language models in speech recognition, and that one of the most common methods for language modeling, the backed-off estimate, is applicable. ...

Find SimilarView on arXiv

Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling

October 29, 2014

85% Match

Paul Rodrigues, David Zajic, David Doermann, ... , Ye Peng

Computation and Language

Machine Learning

Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards. These standards often allow high-level repeating elements to represent lexical entries, and utilize descendants of these repeating elements to represent the structure within each lexical entry, in the form of an XML tree. In many cases, dictionaries are published that have errors and inconsistencies that are expensive to find manually. This paper discusses a method for ...

Find SimilarView on arXiv

Natural Language Parsing as Statistical Pattern Recognition

May 3, 1994

85% Match

David M. Magerman

Computation and Language

Traditional natural language parsers are based on rewrite rule systems developed in an arduous, time-consuming manner by grammarians. A majority of the grammarian's efforts are devoted to the disambiguation process, first hypothesizing rules which dictate constituent categories and relationships among words in ambiguous sentences, and then seeking exceptions and corrections to these rules. In this work, I propose an automatic method for acquiring a statistical parser from a...

Find SimilarView on arXiv

Coordination in Tree Adjoining Grammars: Formalization and Implementation

June 7, 1996

85% Match

Anoop Dept of Computer and Information Science, University of Pennsylvania Sarkar, Aravind Dept of Computer and Information Science, University of Pennsylvania Joshi

Computation and Language

In this paper we show that an account for coordination can be constructed using the derivation structures in a lexicalized Tree Adjoining Grammar (LTAG). We present a notion of derivation in LTAGs that preserves the notion of fixed constituency in the LTAG lexicon while providing the flexibility needed for coordination phenomena. We also discuss the construction of a practical parser for LTAGs that can handle coordination including cases of non-constituent coordination.

Find SimilarView on arXiv

Morphological Irregularity Correlates with Frequency

June 27, 2019

85% Match

Shijie Wu, Ryan Cotterell, Timothy J. O'Donnell

Computation and Language

We present a study of morphological irregularity. Following recent work, we define an information-theoretic measure of irregularity based on the predictability of forms in a language. Using a neural transduction model, we estimate this quantity for the forms in 28 languages. We first present several validatory and exploratory analyses of irregularity. We then show that our analyses provide evidence for a correlation between irregularity and frequency: higher frequency items a...

Find SimilarView on arXiv

Comlex Syntax: Building a Computational Lexicon

November 10, 1994

85% Match

Ralph Computer Science Department, New York University Grishman, Catherine Computer Science Department, New York University Macleod, Adam Computer Science Department, New York University Meyers

Computation and Language

We describe the design of Comlex Syntax, a computational lexicon providing detailed syntactic information for approximately 38,000 English headwords. We consider the types of errors which arise in creating such a lexicon, and how such errors can be measured and controlled.

Find SimilarView on arXiv

Encoding Frequency Information in Lexicalized Grammars

Status of the XTAG System

A New Statistical Parser Based on Bigram Lexical Dependencies

XTAG system - A Wide Coverage Grammar for English

Automatic Extraction of Subcategorization from Corpora

Prepositional Phrase Attachment through a Backed-Off Model

Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling

Natural Language Parsing as Statistical Pattern Recognition

Coordination in Tree Adjoining Grammars: Formalization and Implementation

Morphological Irregularity Correlates with Frequency

Comlex Syntax: Building a Computational Lexicon