Ebook: Advances in Data Analysis: Proceedings of the 30th Annual Conference of the Gesellschaft fur Klassifikation e.V., Freie Universitat Berlin, March 8–10, 2006
- Tags: Statistical Theory and Methods, Data Mining and Knowledge Discovery, Statistics for Business/Economics/Mathematical Finance/Insurance, Statistics for Life Sciences Medicine Health Sciences, Statistics for Social Science Behavorial Scie
- Series: Studies in Classification Data Analysis and Knowledge Organization
- Year: 2007
- Publisher: Springer-Verlag Berlin Heidelberg
- Edition: 1
- Language: English
- pdf
This book focuses on exploratory data analysis, learning of latent structures in datasets, and unscrambling of knowledge. It covers a broad range of methods from multivariate statistics, clustering and classification, visualization and scaling as well as from data and time series analysis. It provides new approaches for information retrieval and data mining. In addition, the book reports challenging applications in marketing and management science, banking and finance, bio- and health sciences, linguistics and text analysis, statistical musicology and sound classification, as well as archaeology. Special emphasis is put on interdisciplinary research and the interaction between theory and practice.
This book focuses on exploratory data analysis, learning of latent structures in datasets, and unscrambling of knowledge. It covers a broad range of methods from multivariate statistics, clustering and classification, visualization and scaling as well as from data and time series analysis. It provides new approaches for information retrieval and data mining. In addition, the book reports challenging applications in marketing and management science, banking and finance, bio- and health sciences, linguistics and text analysis, statistical musicology and sound classification, as well as archaeology. Special emphasis is put on interdisciplinary research and the interaction between theory and practice.
Content:
Front Matter....Pages I-XVI
Front Matter....Pages 1-1
Mixture Models for Classification....Pages 3-14
How to Choose the Number of Clusters: The Cramer Multiplicity Solution....Pages 15-22
Model Selection Criteria for Model-Based Clustering of Categorical Time Series Data: A Monte Carlo Study....Pages 23-30
Cluster Quality Indexes for Symbolic Classification — An Examination....Pages 31-38
Semi-Supervised Clustering: Application to Image Segmentation....Pages 39-50
A Method for Analyzing the Asymptotic Behavior of the Walk Process in Restricted Random Walk Cluster Algorithm....Pages 51-58
Cluster and Select Approach to Classifier Fusion....Pages 59-66
Random Intersection Graphs and Classification....Pages 67-74
Optimized Alignment and Visualization of Clustering Results....Pages 75-82
Finding Cliques in Directed Weighted Graphs Using Complex Hermitian Adjacency Matrices....Pages 83-90
Text Clustering with String Kernels in R....Pages 91-98
Automatic Classification of Functional Data with Extremal Information....Pages 99-106
Typicality Degrees and Fuzzy Prototypes for Clustering....Pages 107-114
On Validation of Hierarchical Clustering....Pages 115-122
Front Matter....Pages 123-123
Rearranging Classified Items in Hierarchies Using Categorization Uncertainty....Pages 125-132
Localized Linear Discriminant Analysis....Pages 133-140
Calibrating Classifier Scores into Probabilities....Pages 141-148
Nonlinear Support Vector Machines Through Iterative Majorization and I-Splines....Pages 149-161
Deriving Consensus Rankings from Benchmarking Experiments....Pages 163-170
Classification of Contradiction Patterns....Pages 171-178
Front Matter....Pages 123-123
Selecting SVM Kernels and Input Variable Subsets in Credit Scoring Models....Pages 179-186
Front Matter....Pages 187-187
Simultaneous Selection of Variables and Smoothing Parameters in Geoadditive Regression Models....Pages 189-196
Modelling and Analysing Interval Data....Pages 197-208
Testing for Genuine Multimodality in Finite Mixture Models: Application to Linear Regression Models....Pages 209-216
Happy Birthday to You, Mr. Wilcoxon!....Pages 217-228
Equivalent Number of Degrees of Freedom for Neural Networks....Pages 229-236
Model Choice for Panel Spatial Models: Crime Modeling in Japan....Pages 237-244
A Boosting Approach to Generalized Monotonic Regression....Pages 245-254
From Eigenspots to Fisherspots — Latent Spaces in the Nonlinear Detection of Spot Patterns in a Highly Varying Background....Pages 255-262
Identifying and Exploiting Ultrametricity....Pages 263-272
Factor Analysis for Extraction of Structural Components and Prediction in Time Series....Pages 273-280
Classification of the U.S. Business Cycle by Dynamic Linear Discriminant Analysis....Pages 281-288
Examination of Several Results of Different Cluster Analyses with a Separate View to Balancing the Economic and Ecological Performance Potential of Towns and Cities....Pages 289-296
Front Matter....Pages 297-297
VOS: A New Method for Visualizing Similarities Between Objects....Pages 299-306
Multidimensional Scaling of Asymmetric Proximities with a Dominance Point....Pages 307-318
Single Cluster Visualization to Optimize Air Traffic Management....Pages 319-325
Rescaling Proximity Matrix Using Entropy Analyzed by INDSCAL....Pages 327-334
Front Matter....Pages 335-335
Canonical Forms for Frequent Graph Mining....Pages 337-349
Applying Clickstream Data Mining to Real-Time Web Crawler Detection and Containment Using ClickTips Platform....Pages 351-358
Plagiarism Detection Without Reference Collections....Pages 359-366
Front Matter....Pages 335-335
Putting Successor Variety Stemming to Work....Pages 367-374
Collaborative Filtering Based on User Trends....Pages 375-382
Investigating Unstructured Texts with Latent Semantic Analysis....Pages 383-390
Front Matter....Pages 391-391
Heterogeneity in Preferences for Odd Prices....Pages 393-400
Classification of Reference Models....Pages 401-408
Adaptive Conjoint Analysis for Pricing Music Downloads....Pages 409-416
Improving the Probabilistic Modeling of Market Basket Data....Pages 417-424
Classification in Marketing Research by Means of LEM2-generated Rules....Pages 425-432
Pricing Energy in a Multi-Utility Market....Pages 433-440
Disproportionate Samples in Hierarchical Bayes CBC Analysis....Pages 441-448
Building on the Arules Infrastructure for Analyzing Transaction Data with R....Pages 449-456
Balanced Scorecard Simulator — A Tool for Stochastic Business Figures....Pages 457-464
Integration of Customer Value into Revenue Management....Pages 465-472
Women’s Occupational Mobility and Segregation in the Labour Market: Asymmetric Multidimensional Scaling....Pages 473-480
Multilevel Dimensions of Consumer Relationships in the Healthcare Service Market M-L IRT vs. M-L SEM Approach....Pages 481-488
Data Mining in Higher Education....Pages 489-496
Attribute Aware Anonymous Recommender Systems....Pages 497-504
Front Matter....Pages 505-505
On the Notions and Properties of Risk and Risk Aversion in the Time Optimal Approach to Decision Making....Pages 507-514
A Model of Rational Choice Among Distributions of Goal Reaching Times....Pages 515-522
On Goal Reaching Time Distributions Estimated from DAX Stock Index Investments....Pages 523-530
Front Matter....Pages 505-505
Credit Risk of Collaterals: Examining the Systematic Linkage between Insolvencies and Physical Assets in Germany....Pages 531-538
Foreign Exchange Trading with Support Vector Machines....Pages 539-546
The Influence of Specific Information on the Credit Risk Level....Pages 547-554
Front Matter....Pages 555-555
Enhancing Bluejay with Scalability, Genome Comparison and Microarray Visualization....Pages 557-568
Discovering Biomarkers for Myocardial Infarction from SELDI-TOF Spectra....Pages 569-576
Joint Analysis of In-situ Hybridization and Gene Expression Data....Pages 577-584
Unsupervised Decision Trees Structured by Gene Ontology (GO-UDTs) for the Interpretation of Microarray Data....Pages 585-592
Front Matter....Pages 593-593
Clustering of Polysemic Words....Pages 595-602
Classifying German Questions According to Ontology-Based Answer Types....Pages 603-610
The Relationship of Word Length and Sentence Length: The Inter-Textual Perspective....Pages 611-618
Comparing the Stability of Different Clustering Results of Dialect Data....Pages 619-626
Part-of-Speech Discovery by Clustering Contextual Features....Pages 627-634
Front Matter....Pages 635-635
A Probabilistic Framework for Audio-Based Tonal Key and Chord Recognition....Pages 637-644
Using MCMC as a Stochastic Optimization Procedure for Monophonic and Polyphonic Sound....Pages 645-652
Vowel Classification by a Neurophysiologically Parameterized Auditory Model....Pages 653-660
Front Matter....Pages 661-661
Uncovering the Internal Structure of the Roman Brick and Tile Making in Frankfurt-Nied by Cluster Validation....Pages 663-670
Where Did I See You Before... A Holistic Method to Compare and Find Archaeological Artifacts....Pages 671-680
Back Matter....Pages 681-687
This book focuses on exploratory data analysis, learning of latent structures in datasets, and unscrambling of knowledge. It covers a broad range of methods from multivariate statistics, clustering and classification, visualization and scaling as well as from data and time series analysis. It provides new approaches for information retrieval and data mining. In addition, the book reports challenging applications in marketing and management science, banking and finance, bio- and health sciences, linguistics and text analysis, statistical musicology and sound classification, as well as archaeology. Special emphasis is put on interdisciplinary research and the interaction between theory and practice.
Content:
Front Matter....Pages I-XVI
Front Matter....Pages 1-1
Mixture Models for Classification....Pages 3-14
How to Choose the Number of Clusters: The Cramer Multiplicity Solution....Pages 15-22
Model Selection Criteria for Model-Based Clustering of Categorical Time Series Data: A Monte Carlo Study....Pages 23-30
Cluster Quality Indexes for Symbolic Classification — An Examination....Pages 31-38
Semi-Supervised Clustering: Application to Image Segmentation....Pages 39-50
A Method for Analyzing the Asymptotic Behavior of the Walk Process in Restricted Random Walk Cluster Algorithm....Pages 51-58
Cluster and Select Approach to Classifier Fusion....Pages 59-66
Random Intersection Graphs and Classification....Pages 67-74
Optimized Alignment and Visualization of Clustering Results....Pages 75-82
Finding Cliques in Directed Weighted Graphs Using Complex Hermitian Adjacency Matrices....Pages 83-90
Text Clustering with String Kernels in R....Pages 91-98
Automatic Classification of Functional Data with Extremal Information....Pages 99-106
Typicality Degrees and Fuzzy Prototypes for Clustering....Pages 107-114
On Validation of Hierarchical Clustering....Pages 115-122
Front Matter....Pages 123-123
Rearranging Classified Items in Hierarchies Using Categorization Uncertainty....Pages 125-132
Localized Linear Discriminant Analysis....Pages 133-140
Calibrating Classifier Scores into Probabilities....Pages 141-148
Nonlinear Support Vector Machines Through Iterative Majorization and I-Splines....Pages 149-161
Deriving Consensus Rankings from Benchmarking Experiments....Pages 163-170
Classification of Contradiction Patterns....Pages 171-178
Front Matter....Pages 123-123
Selecting SVM Kernels and Input Variable Subsets in Credit Scoring Models....Pages 179-186
Front Matter....Pages 187-187
Simultaneous Selection of Variables and Smoothing Parameters in Geoadditive Regression Models....Pages 189-196
Modelling and Analysing Interval Data....Pages 197-208
Testing for Genuine Multimodality in Finite Mixture Models: Application to Linear Regression Models....Pages 209-216
Happy Birthday to You, Mr. Wilcoxon!....Pages 217-228
Equivalent Number of Degrees of Freedom for Neural Networks....Pages 229-236
Model Choice for Panel Spatial Models: Crime Modeling in Japan....Pages 237-244
A Boosting Approach to Generalized Monotonic Regression....Pages 245-254
From Eigenspots to Fisherspots — Latent Spaces in the Nonlinear Detection of Spot Patterns in a Highly Varying Background....Pages 255-262
Identifying and Exploiting Ultrametricity....Pages 263-272
Factor Analysis for Extraction of Structural Components and Prediction in Time Series....Pages 273-280
Classification of the U.S. Business Cycle by Dynamic Linear Discriminant Analysis....Pages 281-288
Examination of Several Results of Different Cluster Analyses with a Separate View to Balancing the Economic and Ecological Performance Potential of Towns and Cities....Pages 289-296
Front Matter....Pages 297-297
VOS: A New Method for Visualizing Similarities Between Objects....Pages 299-306
Multidimensional Scaling of Asymmetric Proximities with a Dominance Point....Pages 307-318
Single Cluster Visualization to Optimize Air Traffic Management....Pages 319-325
Rescaling Proximity Matrix Using Entropy Analyzed by INDSCAL....Pages 327-334
Front Matter....Pages 335-335
Canonical Forms for Frequent Graph Mining....Pages 337-349
Applying Clickstream Data Mining to Real-Time Web Crawler Detection and Containment Using ClickTips Platform....Pages 351-358
Plagiarism Detection Without Reference Collections....Pages 359-366
Front Matter....Pages 335-335
Putting Successor Variety Stemming to Work....Pages 367-374
Collaborative Filtering Based on User Trends....Pages 375-382
Investigating Unstructured Texts with Latent Semantic Analysis....Pages 383-390
Front Matter....Pages 391-391
Heterogeneity in Preferences for Odd Prices....Pages 393-400
Classification of Reference Models....Pages 401-408
Adaptive Conjoint Analysis for Pricing Music Downloads....Pages 409-416
Improving the Probabilistic Modeling of Market Basket Data....Pages 417-424
Classification in Marketing Research by Means of LEM2-generated Rules....Pages 425-432
Pricing Energy in a Multi-Utility Market....Pages 433-440
Disproportionate Samples in Hierarchical Bayes CBC Analysis....Pages 441-448
Building on the Arules Infrastructure for Analyzing Transaction Data with R....Pages 449-456
Balanced Scorecard Simulator — A Tool for Stochastic Business Figures....Pages 457-464
Integration of Customer Value into Revenue Management....Pages 465-472
Women’s Occupational Mobility and Segregation in the Labour Market: Asymmetric Multidimensional Scaling....Pages 473-480
Multilevel Dimensions of Consumer Relationships in the Healthcare Service Market M-L IRT vs. M-L SEM Approach....Pages 481-488
Data Mining in Higher Education....Pages 489-496
Attribute Aware Anonymous Recommender Systems....Pages 497-504
Front Matter....Pages 505-505
On the Notions and Properties of Risk and Risk Aversion in the Time Optimal Approach to Decision Making....Pages 507-514
A Model of Rational Choice Among Distributions of Goal Reaching Times....Pages 515-522
On Goal Reaching Time Distributions Estimated from DAX Stock Index Investments....Pages 523-530
Front Matter....Pages 505-505
Credit Risk of Collaterals: Examining the Systematic Linkage between Insolvencies and Physical Assets in Germany....Pages 531-538
Foreign Exchange Trading with Support Vector Machines....Pages 539-546
The Influence of Specific Information on the Credit Risk Level....Pages 547-554
Front Matter....Pages 555-555
Enhancing Bluejay with Scalability, Genome Comparison and Microarray Visualization....Pages 557-568
Discovering Biomarkers for Myocardial Infarction from SELDI-TOF Spectra....Pages 569-576
Joint Analysis of In-situ Hybridization and Gene Expression Data....Pages 577-584
Unsupervised Decision Trees Structured by Gene Ontology (GO-UDTs) for the Interpretation of Microarray Data....Pages 585-592
Front Matter....Pages 593-593
Clustering of Polysemic Words....Pages 595-602
Classifying German Questions According to Ontology-Based Answer Types....Pages 603-610
The Relationship of Word Length and Sentence Length: The Inter-Textual Perspective....Pages 611-618
Comparing the Stability of Different Clustering Results of Dialect Data....Pages 619-626
Part-of-Speech Discovery by Clustering Contextual Features....Pages 627-634
Front Matter....Pages 635-635
A Probabilistic Framework for Audio-Based Tonal Key and Chord Recognition....Pages 637-644
Using MCMC as a Stochastic Optimization Procedure for Monophonic and Polyphonic Sound....Pages 645-652
Vowel Classification by a Neurophysiologically Parameterized Auditory Model....Pages 653-660
Front Matter....Pages 661-661
Uncovering the Internal Structure of the Roman Brick and Tile Making in Frankfurt-Nied by Cluster Validation....Pages 663-670
Where Did I See You Before... A Holistic Method to Compare and Find Archaeological Artifacts....Pages 671-680
Back Matter....Pages 681-687
....