Learning Algorithms for Keyphrase Extraction

December 10, 2002

Peter D. National Research Council of Canada Turney

Computer Science

Machine Learning

Computation and Language

Information Retrieval

Many academic journals ask their authors to provide a list of about five to fifteen keywords, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a wide variety of tasks for which keyphrases are useful, as we discuss in this paper. We approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. We evaluate the performance of nine different configurations of C4.5. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for automatically extracting keyphrases from text. The experimental results support the claim that a custom-designed algorithm (GenEx), incorporating specialized procedural domain knowledge, can generate better keyphrases than a generalpurpose algorithm (C4.5). Subjective human evaluation of the keyphrases generated by Extractor suggests that about 80% of the keyphrases are acceptable to human readers. This level of performance should be satisfactory for a wide variety of applications.

Learning to Extract Keyphrases from Text

December 8, 2002

98% Match

Peter D. National Research Council of Canada Turney

Machine Learning

Information Retrieval

Find SimilarView on arXiv

KEA: Practical Automatic Keyphrase Extraction

February 5, 1999

93% Match

Ian H. Witten, Gordon W. Paynter, Eibe Frank, ... , Nevill-Manning Craig G.

Digital Libraries

Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate keyphrases using lexical methods, calculates feature values for each candidate, and uses a machine-learning algorithm to predict which candidates are good keyphrases. The machine learning scheme first builds a prediction model using training documents with known keyphrases, and then u...

Find SimilarView on arXiv

A Review of Keyphrase Extraction

May 13, 2019

93% Match

Eirini Papagiannopoulou, Grigorios Tsoumakas

Computation and Language

Information Retrieval

Keyphrase extraction is a textual information processing task concerned with the automatic extraction of representative and characteristic phrases from a document that express all the key aspects of its content. Keyphrases constitute a succinct conceptual summary of a document, which is very useful in digital information management systems for semantic indexing, faceted search, document clustering and classification. This article introduces keyphrase extraction, provides a we...

Find SimilarView on arXiv

A New Approach to Keyphrase Extraction Using Neural Networks

April 19, 2010

93% Match

Kamal Sarkar, Mita Nasipuri, Suranjan Ghose

Information Retrieval

Keyphrases provide a simple way of describing a document, giving the reader some clues about its contents. Keyphrases can be useful in a various applications such as retrieval engines, browsing interfaces, thesaurus construction, text mining etc.. There are also other tasks for which keyphrases are useful, as we discuss in this paper. This paper describes a neural network based approach to keyphrase extraction from scientific articles. Our results show that the proposed metho...

Find SimilarView on arXiv

Extraction of Keyphrases from Text: Evaluation of Four Algorithms

December 8, 2002

92% Match

Peter D. National Research Council of Canada Turney

Machine Learning

Information Retrieval

This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree...

Find SimilarView on arXiv

Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data

December 8, 2002

91% Match

Peter D. National Research Council of Canada Turney

Machine Learning

Information Retrieval

Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. Good performance on this task has been obtained by approaching...

Find SimilarView on arXiv

Coherent Keyphrase Extraction via Web Mining

August 20, 2003

91% Match

Peter D. National Research Council of Canada Turney

Machine Learning

Computation and Language

Information Retrieval

Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. A limitation of previous keyphrase extraction algorithms is th...

Find SimilarView on arXiv

Keyphrase Extraction : Enhancing Lists

April 1, 2012

90% Match

Mario Jarmasz, Caroline Barrière

Computation and Language

Information Retrieval

This paper proposes some modest improvements to Extractor, a state-of-the-art keyphrase extraction system, by using a terabyte-sized corpus to estimate the informativeness and semantic similarity of keyphrases. We present two techniques to improve the organization and remove outliers of lists of keyphrases. The first is a simple ordering according to their occurrences in the corpus; the second is clustering according to semantic similarity. Evaluation issues are discussed. We...

Find SimilarView on arXiv

How Document Pre-processing affects Keyphrase Extraction Performance

October 25, 2016

90% Match

Florian Boudin, Hugo Mougard, Damien Cram

Computation and Language

The SemEval-2010 benchmark dataset has brought renewed attention to the task of automatic keyphrase extraction. This dataset is made up of scientific articles that were automatically converted from PDF format to plain text and thus require careful preprocessing so that irrevelant spans of text do not negatively affect keyphrase extraction performance. In previous work, a wide range of document preprocessing techniques were described but their impact on the overall performance...

Find SimilarView on arXiv

Keyphrase Extraction using Sequential Labeling

August 1, 2016

90% Match

Sujatha Das Gollapalli, Xiao-li Li

Computation and Language

Artificial Intelligence

Information Retrieval

Keyphrases efficiently summarize a document's content and are used in various document processing and retrieval tasks. Several unsupervised techniques and classifiers exist for extracting keyphrases from text documents. Most of these methods operate at a phrase-level and rely on part-of-speech (POS) filters for candidate phrase generation. In addition, they do not directly handle keyphrases of varying lengths. We overcome these modeling shortcomings by addressing keyphrase ex...

Find SimilarView on arXiv