Ebook: Data Science and Classification
- Tags: Statistical Theory and Methods, Data Structures Cryptology and Information Theory, Information Systems and Communication Service, Pattern Recognition, Statistics for Business/Economics/Mathematical Finance/Insurance, Statistics for Soci
- Series: Studies in Classification Data Analysis and Knowledge Organization
- Year: 2006
- Publisher: Springer-Verlag Berlin Heidelberg
- Edition: 1
- Language: English
- pdf
From the reviews:
"This book is a collection of papers presented at the Tenth Conference of the International Federation of Classification Societies. The contributors are primarily statisticians and computer scientists … . The typesetting and page layout are well done, and the graphics are very clear. … The main market for this book would be libraries, and researchers wanting a record of recent advances in statistical learning." (Jeffrey D. Picka, Technometrics, Vol. 49 (3), August, 2007)
Data Science and Classification provides new methodological developments in data analysis and classification. The broad and comprehensive coverage includes the measurement of similarity and dissimilarity, methods for classification and clustering, network and graph analyses, analysis of symbolic data, and web mining. Beyond structural and theoretical results, the book offers application advice for a variety of problems, in medicine, microarray analysis, social network structures, and music.
The combination of new methodological advances with the wide range of real applications collected in this volume will be of special value for researchers when choosing the most appropriate among newly developed analytical tools for their research problems in classification and data analysis.
Data Science and Classification provides new methodological developments in data analysis and classification. The broad and comprehensive coverage includes the measurement of similarity and dissimilarity, methods for classification and clustering, network and graph analyses, analysis of symbolic data, and web mining. Beyond structural and theoretical results, the book offers application advice for a variety of problems, in medicine, microarray analysis, social network structures, and music.
The combination of new methodological advances with the wide range of real applications collected in this volume will be of special value for researchers when choosing the most appropriate among newly developed analytical tools for their research problems in classification and data analysis.
Content:
Front Matter....Pages I-XII
Front Matter....Pages 1-1
A Tree-Based Similarity for Evaluating Concept Proximities in an Ontology....Pages 3-11
Improved Fr?chet Distance for Time Series....Pages 13-20
Comparison of Distance Indices Between Partitions....Pages 21-28
Design of Dissimilarity Measures: A New Dissimilarity Between Species Distribution Areas....Pages 29-37
Dissimilarities for Web Usage Mining....Pages 39-46
Properties and Performance of Shape Similarity Measures....Pages 47-56
Front Matter....Pages 57-57
Hierarchical Clustering for Boxplot Variables....Pages 59-66
Evaluation of Allocation Rules Under Some Cost Constraints....Pages 67-73
Crisp Partitions Induced by a Fuzzy Set....Pages 75-82
Empirical Comparison of a Monothetic Divisive Clustering Method with the Ward and the k-means Clustering Methods....Pages 83-90
Model Selection for the Binary Latent Class Model: A Monte Carlo Simulation....Pages 91-99
Finding Meaningful and Stable Clusters Using Local Cluster Analysis....Pages 101-108
Comparing Optimal Individual and Collective Assessment Procedures....Pages 109-116
Front Matter....Pages 117-117
Some Open Problem Sets for Generalized Blockmodeling....Pages 119-130
Spectral Clustering and Multidimensional Scaling: A Unified View....Pages 131-139
Analyzing the Structure of U.S. Patents Network....Pages 141-148
Identifying and Classifying Social Groups: A Machine Learning Approach....Pages 149-157
Front Matter....Pages 159-159
Multidimensional Scaling of Histogram Dissimilarities....Pages 161-170
Dependence and Interdependence Analysis for Interval-Valued Variables....Pages 171-183
A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data....Pages 185-192
Front Matter....Pages 159-159
Symbolic Clustering of Large Datasets....Pages 193-201
A Dynamic Clustering Method for Mixed Feature-Type Symbolic Data....Pages 203-210
Front Matter....Pages 211-211
Iterated Boosting for Outlier Detection....Pages 213-220
Sub-species of Homopus Areolatus? Biplots and Small Class Inference with Analysis of Distance....Pages 221-228
Revised Boxplot Based Discretization as the Kernel of Automatic Interpretation of Classes Using Numerical Variables....Pages 229-237
Front Matter....Pages 239-239
Comparison of Two Methods for Detecting and Correcting Systematic Error in High-throughput Screening Data....Pages 241-249
kNN Versus SVM in the Collaborative Filtering Framework....Pages 251-260
Mining Association Rules in Folksonomies....Pages 261-270
Empirical Analysis of Attribute-Aware Recommendation Algorithms with Variable Synthetic Data....Pages 271-278
Patterns of Associations in Finite Sets of Items....Pages 279-286
Front Matter....Pages 287-287
Generalized N-gram Measures for Melodic Similarity....Pages 289-298
Evaluating Different Approaches to Measuring the Similarity of Melodies....Pages 299-306
Using MCMC as a Stochastic Optimization Procedure for Musical Time Series....Pages 307-314
Local Models in Register Classification by Timbre....Pages 315-322
Front Matter....Pages 323-323
Improving the Performance of Principal Components for Classification of Gene Expression Data Through Feature Selection....Pages 325-332
A New Efficient Method for Assessing Missing Nucleotides in DNA Sequences in the Framework of a Generic Evolutionary Model....Pages 333-340
New Efficient Algorithm for Modeling Partial and Complete Gene Transfer Scenarios....Pages 341-349
Back Matter....Pages 351-358
Data Science and Classification provides new methodological developments in data analysis and classification. The broad and comprehensive coverage includes the measurement of similarity and dissimilarity, methods for classification and clustering, network and graph analyses, analysis of symbolic data, and web mining. Beyond structural and theoretical results, the book offers application advice for a variety of problems, in medicine, microarray analysis, social network structures, and music.
The combination of new methodological advances with the wide range of real applications collected in this volume will be of special value for researchers when choosing the most appropriate among newly developed analytical tools for their research problems in classification and data analysis.
Content:
Front Matter....Pages I-XII
Front Matter....Pages 1-1
A Tree-Based Similarity for Evaluating Concept Proximities in an Ontology....Pages 3-11
Improved Fr?chet Distance for Time Series....Pages 13-20
Comparison of Distance Indices Between Partitions....Pages 21-28
Design of Dissimilarity Measures: A New Dissimilarity Between Species Distribution Areas....Pages 29-37
Dissimilarities for Web Usage Mining....Pages 39-46
Properties and Performance of Shape Similarity Measures....Pages 47-56
Front Matter....Pages 57-57
Hierarchical Clustering for Boxplot Variables....Pages 59-66
Evaluation of Allocation Rules Under Some Cost Constraints....Pages 67-73
Crisp Partitions Induced by a Fuzzy Set....Pages 75-82
Empirical Comparison of a Monothetic Divisive Clustering Method with the Ward and the k-means Clustering Methods....Pages 83-90
Model Selection for the Binary Latent Class Model: A Monte Carlo Simulation....Pages 91-99
Finding Meaningful and Stable Clusters Using Local Cluster Analysis....Pages 101-108
Comparing Optimal Individual and Collective Assessment Procedures....Pages 109-116
Front Matter....Pages 117-117
Some Open Problem Sets for Generalized Blockmodeling....Pages 119-130
Spectral Clustering and Multidimensional Scaling: A Unified View....Pages 131-139
Analyzing the Structure of U.S. Patents Network....Pages 141-148
Identifying and Classifying Social Groups: A Machine Learning Approach....Pages 149-157
Front Matter....Pages 159-159
Multidimensional Scaling of Histogram Dissimilarities....Pages 161-170
Dependence and Interdependence Analysis for Interval-Valued Variables....Pages 171-183
A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data....Pages 185-192
Front Matter....Pages 159-159
Symbolic Clustering of Large Datasets....Pages 193-201
A Dynamic Clustering Method for Mixed Feature-Type Symbolic Data....Pages 203-210
Front Matter....Pages 211-211
Iterated Boosting for Outlier Detection....Pages 213-220
Sub-species of Homopus Areolatus? Biplots and Small Class Inference with Analysis of Distance....Pages 221-228
Revised Boxplot Based Discretization as the Kernel of Automatic Interpretation of Classes Using Numerical Variables....Pages 229-237
Front Matter....Pages 239-239
Comparison of Two Methods for Detecting and Correcting Systematic Error in High-throughput Screening Data....Pages 241-249
kNN Versus SVM in the Collaborative Filtering Framework....Pages 251-260
Mining Association Rules in Folksonomies....Pages 261-270
Empirical Analysis of Attribute-Aware Recommendation Algorithms with Variable Synthetic Data....Pages 271-278
Patterns of Associations in Finite Sets of Items....Pages 279-286
Front Matter....Pages 287-287
Generalized N-gram Measures for Melodic Similarity....Pages 289-298
Evaluating Different Approaches to Measuring the Similarity of Melodies....Pages 299-306
Using MCMC as a Stochastic Optimization Procedure for Musical Time Series....Pages 307-314
Local Models in Register Classification by Timbre....Pages 315-322
Front Matter....Pages 323-323
Improving the Performance of Principal Components for Classification of Gene Expression Data Through Feature Selection....Pages 325-332
A New Efficient Method for Assessing Missing Nucleotides in DNA Sequences in the Framework of a Generic Evolutionary Model....Pages 333-340
New Efficient Algorithm for Modeling Partial and Complete Gene Transfer Scenarios....Pages 341-349
Back Matter....Pages 351-358
....