Ebook: Machine Learning in Document Analysis and Recognition
- Tags: Appl.Mathematics/Computational Methods of Engineering, Artificial Intelligence (incl. Robotics)
- Series: Studies in Computational Intelligence 90
- Year: 2008
- Publisher: Springer-Verlag Berlin Heidelberg
- Edition: 1
- Language: English
- pdf
The objective of Document Analysis and Recognition (DAR) is to recognize the text and graphicalcomponents of a document and to extract information. With ?rst papers dating back to the 1960’s, DAR is a mature but still gr- ing research?eld with consolidated and known techniques. Optical Character Recognition (OCR) engines are some of the most widely recognized pr- ucts of the research in this ?eld, while broader DAR techniques are nowadays studied and applied to other industrial and o?ce automation systems. In the machine learning community, one of the most widely known - search problems addressed in DAR is recognition of unconstrained handwr- ten characters which has been frequently used in the past as a benchmark for evaluating machine learning algorithms, especially supervised classi?ers. However, developing a DAR system is a complex engineering task that involves the integration of multiple techniques into an organic framework. A reader may feel that the use of machine learning algorithms is not approp- ate for other DAR tasks than character recognition. On the contrary, such algorithms have been massively used for nearly all the tasks in DAR. With large emphasis being devoted to character recognition and word recognition, other tasks such as pre-processing, layout analysis, character segmentation, and signature veri?cation have also bene?ted much from machine learning algorithms.
The objective of Document Analysis and Recognition (DAR) is to recognize the text and graphical components of a document and to extract information. This book is a collection of research papers and state-of-the-art reviews by leading researchers all over the world. It includes pointers to challenges and opportunities for future research directions.
The main goals of the book are identification of good practices for the use of learning strategies in DAR, identification of DAR tasks more appropriate for these techniques, and highlighting new learning algorithms that may be successfully applied to DAR.
The objective of Document Analysis and Recognition (DAR) is to recognize the text and graphical components of a document and to extract information. This book is a collection of research papers and state-of-the-art reviews by leading researchers all over the world. It includes pointers to challenges and opportunities for future research directions.
The main goals of the book are identification of good practices for the use of learning strategies in DAR, identification of DAR tasks more appropriate for these techniques, and highlighting new learning algorithms that may be successfully applied to DAR.
Content:
Front Matter....Pages I-XI
Introduction to Document Analysis and Recognition....Pages 1-20
Structure Extraction in Printed Documents Using Neural Approaches....Pages 21-43
Machine Learning for Reading Order Detection in Document Image Understanding....Pages 45-69
Decision-Based Specification and Comparison of Table Recognition Algorithms....Pages 71-103
Machine Learning for Digital Document Processing: from Layout Analysis to Metadata Extraction....Pages 105-138
Classification and Learning Methods for Character Recognition: Advances and Remaining Problems....Pages 139-161
Combining Classifiers with Informational Confidence....Pages 163-191
Self-Organizing Maps for Clustering in Document Image Analysis....Pages 193-219
Adaptive and Interactive Approaches to Document Analysis....Pages 221-257
Cursive Character Segmentation Using Neural Network Techniques....Pages 259-275
Multiple Hypotheses Document Analysis....Pages 277-303
Learning Matching Score Dependencies for Classifier Combination....Pages 305-332
Perturbation Models for Generating Synthetic Training Data in Handwriting Recognition....Pages 333-360
Review of Classifier Combination Methods....Pages 361-386
Machine Learning for Signature Verification....Pages 387-408
Off-line Writer Identification and Verification Using Gaussian Mixture Models....Pages 409-428
Back Matter....Pages 429-433
The objective of Document Analysis and Recognition (DAR) is to recognize the text and graphical components of a document and to extract information. This book is a collection of research papers and state-of-the-art reviews by leading researchers all over the world. It includes pointers to challenges and opportunities for future research directions.
The main goals of the book are identification of good practices for the use of learning strategies in DAR, identification of DAR tasks more appropriate for these techniques, and highlighting new learning algorithms that may be successfully applied to DAR.
Content:
Front Matter....Pages I-XI
Introduction to Document Analysis and Recognition....Pages 1-20
Structure Extraction in Printed Documents Using Neural Approaches....Pages 21-43
Machine Learning for Reading Order Detection in Document Image Understanding....Pages 45-69
Decision-Based Specification and Comparison of Table Recognition Algorithms....Pages 71-103
Machine Learning for Digital Document Processing: from Layout Analysis to Metadata Extraction....Pages 105-138
Classification and Learning Methods for Character Recognition: Advances and Remaining Problems....Pages 139-161
Combining Classifiers with Informational Confidence....Pages 163-191
Self-Organizing Maps for Clustering in Document Image Analysis....Pages 193-219
Adaptive and Interactive Approaches to Document Analysis....Pages 221-257
Cursive Character Segmentation Using Neural Network Techniques....Pages 259-275
Multiple Hypotheses Document Analysis....Pages 277-303
Learning Matching Score Dependencies for Classifier Combination....Pages 305-332
Perturbation Models for Generating Synthetic Training Data in Handwriting Recognition....Pages 333-360
Review of Classifier Combination Methods....Pages 361-386
Machine Learning for Signature Verification....Pages 387-408
Off-line Writer Identification and Verification Using Gaussian Mixture Models....Pages 409-428
Back Matter....Pages 429-433
....