Ebook: Parallel Text Processing: Alignment and Use of Translation Corpora
- Tags: Computational Linguistics, Language Translation and Linguistics, Artificial Intelligence (incl. Robotics), Applied Linguistics
- Series: Text Speech and Language Technology 13
- Year: 2000
- Publisher: Springer Netherlands
- Edition: 1
- Language: English
- pdf
l This book evolved from the ARCADE evaluation exercise that started in 1995. The project's goal is to evaluate alignment systems for parallel texts, i. e. , texts accompanied by their translation. Thirteen teams from various places around the world have participated so far and for the first time, some ten to fifteen years after the first alignment techniques were designed, the community has been able to get a clear picture of the behaviour of alignment systems. Several chapters in this book describe the details of competing systems, and the last chapter is devoted to the description of the evaluation protocol and results. The remaining chapters were especially commissioned from researchers who have been major figures in the field in recent years, in an attempt to address a wide range of topics that describe the state of the art in parallel text processing and use. As I recalled in the introduction, the Rosetta stone won eternal fame as the prototype of parallel texts, but such texts are probably almost as old as the invention of writing. Nowadays, parallel texts are electronic, and they are be coming an increasingly important resource for building the natural language processing tools needed in the "multilingual information society" that is cur rently emerging at an incredible speed. Applications are numerous, and they are expanding every day: multilingual lexicography and terminology, machine and human translation, cross-language information retrieval, language learning, etc.
With the rising importance of multilingualism in language industries, brought about by global markets and world-wide information exchange, parallel corpora, i.e. corpora of texts accompanied by their translation, have become key resources in the development of natural language processing tools. The applications based upon parallel corpora are numerous and growing in number: multilingual lexicography and terminology, machine and human translation, cross-language information retrieval, language learning, etc.
The book's chapters have been commissioned from major figures in the field of parallel corpus building and exploitation, with the aim of showing the state of the art in parallel text alignment and use ten to fifteen years after the first parallel-text alignment techniques were developed. Within the book, the following broad themes are addressed: (i) techniques for the alignment of parallel texts at various levels such as sentence, clause, and word; (ii) the use of parallel texts in fields as diverse as translation, lexicography, and information retrieval; (iii) available corpus resources and the evaluation of alignment methods.
The book will be of interest to researchers and advanced students of computational linguistics, terminology, lexicography and translation, both in academia and industry.
With the rising importance of multilingualism in language industries, brought about by global markets and world-wide information exchange, parallel corpora, i.e. corpora of texts accompanied by their translation, have become key resources in the development of natural language processing tools. The applications based upon parallel corpora are numerous and growing in number: multilingual lexicography and terminology, machine and human translation, cross-language information retrieval, language learning, etc.
The book's chapters have been commissioned from major figures in the field of parallel corpus building and exploitation, with the aim of showing the state of the art in parallel text alignment and use ten to fifteen years after the first parallel-text alignment techniques were developed. Within the book, the following broad themes are addressed: (i) techniques for the alignment of parallel texts at various levels such as sentence, clause, and word; (ii) the use of parallel texts in fields as diverse as translation, lexicography, and information retrieval; (iii) available corpus resources and the evaluation of alignment methods.
The book will be of interest to researchers and advanced students of computational linguistics, terminology, lexicography and translation, both in academia and industry.
Content:
Front Matter....Pages i-xxiii
From the Rosetta stone to the information society....Pages 1-24
Pattern recognition for mapping bitext correspondence....Pages 25-47
Multilingual text alignment....Pages 49-67
A comprehensive bilingual word alignment system....Pages 69-96
A knowledge-lite approach to word alignment....Pages 97-116
From sentences to words and clauses....Pages 117-138
Bracketing and aligning words and constituents in parallel text using Stochastic Inversion Transduction Grammars....Pages 139-167
The translation network....Pages 169-186
Parallel text alignment using crosslingual information retrieval techniques....Pages 187-200
Parallel alignment of structured documents....Pages 201-217
A statistical view on bilingual lexicon extraction....Pages 219-236
Terminology extraction from parallel technical texts....Pages 237-252
Term alignment in use....Pages 253-274
Automatic dictionary extraction for cross-language information retrieval....Pages 275-298
Parallel texts in computer-assisted language learning....Pages 299-311
Japanese-English aligned bilingual corpora....Pages 313-334
Building a parallel corpus of English/Panjabi....Pages 335-346
Sharing of translation memory databases derived from aligned parallel text....Pages 347-368
Evaluation of parallel text alignment systems....Pages 369-388
Back Matter....Pages 389-403
With the rising importance of multilingualism in language industries, brought about by global markets and world-wide information exchange, parallel corpora, i.e. corpora of texts accompanied by their translation, have become key resources in the development of natural language processing tools. The applications based upon parallel corpora are numerous and growing in number: multilingual lexicography and terminology, machine and human translation, cross-language information retrieval, language learning, etc.
The book's chapters have been commissioned from major figures in the field of parallel corpus building and exploitation, with the aim of showing the state of the art in parallel text alignment and use ten to fifteen years after the first parallel-text alignment techniques were developed. Within the book, the following broad themes are addressed: (i) techniques for the alignment of parallel texts at various levels such as sentence, clause, and word; (ii) the use of parallel texts in fields as diverse as translation, lexicography, and information retrieval; (iii) available corpus resources and the evaluation of alignment methods.
The book will be of interest to researchers and advanced students of computational linguistics, terminology, lexicography and translation, both in academia and industry.
Content:
Front Matter....Pages i-xxiii
From the Rosetta stone to the information society....Pages 1-24
Pattern recognition for mapping bitext correspondence....Pages 25-47
Multilingual text alignment....Pages 49-67
A comprehensive bilingual word alignment system....Pages 69-96
A knowledge-lite approach to word alignment....Pages 97-116
From sentences to words and clauses....Pages 117-138
Bracketing and aligning words and constituents in parallel text using Stochastic Inversion Transduction Grammars....Pages 139-167
The translation network....Pages 169-186
Parallel text alignment using crosslingual information retrieval techniques....Pages 187-200
Parallel alignment of structured documents....Pages 201-217
A statistical view on bilingual lexicon extraction....Pages 219-236
Terminology extraction from parallel technical texts....Pages 237-252
Term alignment in use....Pages 253-274
Automatic dictionary extraction for cross-language information retrieval....Pages 275-298
Parallel texts in computer-assisted language learning....Pages 299-311
Japanese-English aligned bilingual corpora....Pages 313-334
Building a parallel corpus of English/Panjabi....Pages 335-346
Sharing of translation memory databases derived from aligned parallel text....Pages 347-368
Evaluation of parallel text alignment systems....Pages 369-388
Back Matter....Pages 389-403
....