Ebook: Treebanks: Building and Using Parsed Corpora
- Tags: Linguistics (general), Computational Linguistics, Artificial Intelligence (incl. Robotics), Syntax, Grammar
- Series: Text Speech and Language Technology 20
- Year: 2003
- Publisher: Springer Netherlands
- Edition: 1
- Language: English
- pdf
Linguists and engineers in Natural Language Processing tend to use electronic corpora more and more. Most research has long been limited to raw (unannotated) texts or to tagged texts (annotated with parts of speech only), but these approaches suffer from a word by word perspective. A new line of research involves corpora with richer annotations such as clauses and major constituents, grammatical functions and dependency links. The first parsed corpora were the English Lancaster treebank and Penn Treebank. New ones have recently been developed for other languages.
This book:
provides a state of the art on work being done with parsed corpora;
gathers 21 papers on building and using parsed corpora raising many relevant questions;
deals with a variety of languages and a variety of corpora;
is for those working in linguistics, computational linguistics, natural language, syntax, and grammar.
Linguists and engineers in Natural Language Processing tend to use electronic corpora more and more. Most research has long been limited to raw (unannotated) texts or to tagged texts (annotated with parts of speech only), but these approaches suffer from a word by word perspective. A new line of research involves corpora with richer annotations such as clauses and major constituents, grammatical functions and dependency links. The first parsed corpora were the English Lancaster treebank and Penn Treebank. New ones have recently been developed for other languages.
This book:
provides a state of the art on work being done with parsed corpora;
gathers 21 papers on building and using parsed corpora raising many relevant questions;
deals with a variety of languages and a variety of corpora;
is for those working in linguistics, computational linguistics, natural language, syntax, and grammar.
Linguists and engineers in Natural Language Processing tend to use electronic corpora more and more. Most research has long been limited to raw (unannotated) texts or to tagged texts (annotated with parts of speech only), but these approaches suffer from a word by word perspective. A new line of research involves corpora with richer annotations such as clauses and major constituents, grammatical functions and dependency links. The first parsed corpora were the English Lancaster treebank and Penn Treebank. New ones have recently been developed for other languages.
This book:
provides a state of the art on work being done with parsed corpora;
gathers 21 papers on building and using parsed corpora raising many relevant questions;
deals with a variety of languages and a variety of corpora;
is for those working in linguistics, computational linguistics, natural language, syntax, and grammar.
Content:
Front Matter....Pages i-xxvi
Front Matter....Pages 1-1
The Penn Treebank: An Overview....Pages 5-22
Thoughts on Two Decades of Drawing Trees....Pages 23-41
Bank of English and Beyond....Pages 43-59
Completing Parsed Corpora....Pages 61-71
Syntactic Annotation of a German Newspaper Corpus....Pages 73-87
Annotation of Error Types for German Newsgroup Corpus....Pages 89-100
The Prague Dependency Treebank....Pages 103-127
An HPSG-Annotated Test Suite for Polish....Pages 129-146
Developing a Syntactic Annotation Scheme and Tools for a Spanish Treebank....Pages 149-163
Building a Treebank for French....Pages 165-187
Building the Italian Syntactic-Semantic Treebank....Pages 189-210
Automated Creation of a Medieval Portuguese Partial Treebank....Pages 211-227
Sinica Treebank....Pages 231-248
Building A Japanese Parsed Corpus....Pages 249-260
Building a Turkish Treebank....Pages 261-277
Front Matter....Pages 279-279
Encoding Syntactic Annotation....Pages 281-296
Parser Evaluation....Pages 299-316
Dependency-Based Evaluation of Minipar....Pages 317-329
Extracting Stochastic Grammars from Treebanks....Pages 333-349
A Uniform Method for Automatically Extracting Stochastic Lexicalized Tree Grammars from Treebanks and HPSG....Pages 351-365
Back Matter....Pages 391-407
From Treebank Resources to LFG F-Structures....Pages 367-389
Linguists and engineers in Natural Language Processing tend to use electronic corpora more and more. Most research has long been limited to raw (unannotated) texts or to tagged texts (annotated with parts of speech only), but these approaches suffer from a word by word perspective. A new line of research involves corpora with richer annotations such as clauses and major constituents, grammatical functions and dependency links. The first parsed corpora were the English Lancaster treebank and Penn Treebank. New ones have recently been developed for other languages.
This book:
provides a state of the art on work being done with parsed corpora;
gathers 21 papers on building and using parsed corpora raising many relevant questions;
deals with a variety of languages and a variety of corpora;
is for those working in linguistics, computational linguistics, natural language, syntax, and grammar.
Content:
Front Matter....Pages i-xxvi
Front Matter....Pages 1-1
The Penn Treebank: An Overview....Pages 5-22
Thoughts on Two Decades of Drawing Trees....Pages 23-41
Bank of English and Beyond....Pages 43-59
Completing Parsed Corpora....Pages 61-71
Syntactic Annotation of a German Newspaper Corpus....Pages 73-87
Annotation of Error Types for German Newsgroup Corpus....Pages 89-100
The Prague Dependency Treebank....Pages 103-127
An HPSG-Annotated Test Suite for Polish....Pages 129-146
Developing a Syntactic Annotation Scheme and Tools for a Spanish Treebank....Pages 149-163
Building a Treebank for French....Pages 165-187
Building the Italian Syntactic-Semantic Treebank....Pages 189-210
Automated Creation of a Medieval Portuguese Partial Treebank....Pages 211-227
Sinica Treebank....Pages 231-248
Building A Japanese Parsed Corpus....Pages 249-260
Building a Turkish Treebank....Pages 261-277
Front Matter....Pages 279-279
Encoding Syntactic Annotation....Pages 281-296
Parser Evaluation....Pages 299-316
Dependency-Based Evaluation of Minipar....Pages 317-329
Extracting Stochastic Grammars from Treebanks....Pages 333-349
A Uniform Method for Automatically Extracting Stochastic Lexicalized Tree Grammars from Treebanks and HPSG....Pages 351-365
Back Matter....Pages 391-407
From Treebank Resources to LFG F-Structures....Pages 367-389
....