Robust Parsing Based on Discourse Inform...

Fast Statistical Parsing of Noun Phrases for Document Indexing

February 12, 1997

86% Match

Chengxiang Carnegie Mellon University Zhai

Computation and Language

Information Retrieval (IR) is an important application area of Natural Language Processing (NLP) where one encounters the genuine challenge of processing large quantities of unrestricted natural language text. While much effort has been made to apply NLP techniques to IR, very few NLP techniques have been evaluated on a document collection larger than several megabytes. Many NLP techniques are simply not efficient enough, and not robust enough, to handle a large amount of tex...

Find SimilarView on arXiv

Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers' Utterances in Spoken Dialog

December 23, 1997

86% Match

Peter A. University of Rochester Heeman

Computation and Language

In this thesis, we present a statistical language model for resolving speech repairs, intonational boundaries and discourse markers. Rather than finding the best word interpretation for an acoustic signal, we redefine the speech recognition problem to so that it also identifies the POS tags, discourse markers, speech repairs and intonational phrase endings (a major cue in determining utterance units). Adding these extra elements to the speech recognition problem actually allo...

Find SimilarView on arXiv

Simple and Effective Text Simplification Using Semantic and Neural Methods

October 11, 2018

86% Match

Elior Sulem, Omri Abend, Ari Rappoport

Computation and Language

Sentence splitting is a major simplification operator. Here we present a simple and efficient splitting algorithm based on an automatic semantic parser. After splitting, the text is amenable for further fine-tuned simplification operations. In particular, we show that neural Machine Translation can be effectively used in this situation. Previous application of Machine Translation for simplification suffers from a considerable disadvantage in that they are over-conservative, o...

Find SimilarView on arXiv

Data-Oriented Language Processing. An Overview

November 14, 1996

86% Match

Rens University of Amsterdam Bod, Remko University of Amsterdam Scha

Computation and Language

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will c...

Find SimilarView on arXiv

A Robust System for Natural Spoken Dialogue

June 18, 1996

86% Match

James F. University of Rochester Allen, Bradford W. University of Rochester Miller, ... , Sikorski Teresa University of Rochester

Computation and Language

This paper describes a system that leads us to believe in the feasibility of constructing natural spoken dialogue systems in task-oriented domains. It specifically addresses the issue of robust interpretation of speech in the presence of recognition errors. Robustness is achieved by a combination of statistical error post-correction, syntactically- and semantically-driven robust parsing, and extensive use of the dialogue context. We present an evaluation of the system using t...

Find SimilarView on arXiv

A Review on Part-of-Speech Technologies

October 11, 2021

86% Match

Onyenwe Ikechukwu, Onyedikachukwu Ikechukwu-Onyenwe, Onyedinma Ebele

Computation and Language

Machine Learning

Developing an automatic part-of-speech (POS) tagging for any new language is considered a necessary step for further computational linguistics methodology beyond tagging, like chunking and parsing, to be fully applied to the language. Many POS disambiguation technologies have been developed for this type of research and there are factors that influence the choice of choosing one. This could be either corpus-based or non-corpus-based. In this paper, we present a review of POS ...

Find SimilarView on arXiv

Learning Unification-Based Natural Language Grammars

February 3, 1995

86% Match

Miles Dept. of Computer Science, University of York, York, England Osborne

Computation and Language

When parsing unrestricted language, wide-covering grammars often undergenerate. Undergeneration can be tackled either by sentence correction, or by grammar correction. This thesis concentrates upon automatic grammar correction (or machine learning of grammar) as a solution to the problem of undergeneration. Broadly speaking, grammar correction approaches can be classified as being either {\it data-driven}, or {\it model-based}. Data-driven learners use data-intensive methods ...

Find SimilarView on arXiv

Resolution of Unidentified Words in Machine Translation

November 9, 2009

86% Match

Sana Ullah, M. Asdaque Hussain, Kyung Sup Kwak

Computation and Language

This paper presents a mechanism of resolving unidentified lexical units in Text-based Machine Translation (TBMT). In a Machine Translation (MT) system it is unlikely to have a complete lexicon and hence there is intense need of a new mechanism to handle the problem of unidentified words. These unknown words could be abbreviations, names, acronyms and newly introduced terms. We have proposed an algorithm for the resolution of the unidentified words. This algorithm takes discou...

Find SimilarView on arXiv

Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics

November 23, 2019

86% Match

Preslav Nakov

Computation and Language

Information Retrieval

An important characteristic of English written text is the abundance of noun compounds - sequences of nouns acting as a single noun, e.g., colon cancer tumor suppressor protein. While eventually mastered by domain experts, their interpretation poses a major challenge for automated analysis. Understanding noun compounds' syntax and semantics is important for many natural language applications, including question answering, machine translation, information retrieval, and inform...

Find SimilarView on arXiv

When Do Discourse Markers Affect Computational Sentence Understanding?

September 1, 2023

86% Match

Ruiqi Li, Liesbeth Allein, ... , Moens Marie-Francine

Computation and Language

The capabilities and use cases of automatic natural language processing (NLP) have grown significantly over the last few years. While much work has been devoted to understanding how humans deal with discourse connectives, this phenomenon is understudied in computational systems. Therefore, it is important to put NLP models under the microscope and examine whether they can adequately comprehend, process, and reason within the complexity of natural language. In this chapter, we...

Find SimilarView on arXiv

Robust Parsing Based on Discourse Information: Completing partial parses of ill-formed sentences on the basis of discourse information

Fast Statistical Parsing of Noun Phrases for Document Indexing

Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers' Utterances in Spoken Dialog

Simple and Effective Text Simplification Using Semantic and Neural Methods

Data-Oriented Language Processing. An Overview

A Robust System for Natural Spoken Dialogue

A Review on Part-of-Speech Technologies

Learning Unification-Based Natural Language Grammars

Resolution of Unidentified Words in Machine Translation

Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics

When Do Discourse Markers Affect Computational Sentence Understanding?