Online Library TheLib.net » Advanced Statistical Methods for the Analysis of Large Data-Sets

The theme of the meeting was “Statistical Methods for the Analysis of Large Data-Sets”. In recent years there has been increasing interest in this subject; in fact a huge quantity of information is often available but standard statistical techniques are usually not well suited to managing this kind of data. The conference serves as an important meeting point for European researchers working on this topic and a number of European statistical societies participated in the organization of the event.

The book includes 45 papers from a selection of the 156 papers accepted for presentation and discussed at the conference on “Advanced Statistical Methods for the Analysis of Large Data-sets.”




The theme of the meeting was “Statistical Methods for the Analysis of Large Data-Sets”. In recent years there has been increasing interest in this subject; in fact a huge quantity of information is often available but standard statistical techniques are usually not well suited to managing this kind of data. The conference serves as an important meeting point for European researchers working on this topic and a number of European statistical societies participated in the organization of the event. The book includes 45 papers from a selection of the 156 papers accepted for presentation and discussed at the conference on “Advanced Statistical Methods for the Analysis of Large Data-sets.” Table of Contents Cover Advanced Statistical Methods for the Analysis of Large Data-Sets Editorial Preface Contents Part I Clustering Large Data-Sets Clustering Large Data Set: An Applied Comparative Study 1 Introduction 2 Clustering Strategies 3 Application to Real Data References Clustering in Feature Space for Interesting Pattern Identification of Categorical Data 1 Introduction 2 Support Vector Clustering on MCA Factors 3 Empirical Evidence 4 Conclusion References Clustering Geostatistical Functional Data 1 Introduction 2 Geostatistical Functional Data 3 Dynamic Clustering for Spatio-Functional Data 4 Clusterwise Linear Regression for Spatially Correlated Functional Data 5 Mareographic Network Analysis References Joint Clustering and Alignment of Functional Data: An Application to Vascular Geometries 1 Introduction 2 The k-Mean Alignment Algorithm 3 Analysis of Internal Carotid Artery Centerlines o 3.1 Centerline Clusters vs Cerebral Aneurysms o 3.2 Centerline Shapes vs Cerebral Aneurysms 4 Conclusions References Part II Statistics in Medicine Bayesian Methods for Time Course Microarray Analysis: From Genes' Detection to Clustering 1 Introduction 2 Detection of Differentially Expressed Genes in Time Course Microarray Experiments 3 Clustering Time Course Profiles 4 Results o 4.1 Simulations o 4.2 Case Study References Longitudinal Analysis of Gene Expression Profiles Using Functional Mixed-Effects Models 1 Introduction 2 A Case Study: Tuberculosis and BCG Vaccination 3 Mixed-Effects Smoothing Splines Models o 3.1 Statistical Inference 4 Performance Assessment Using Simulated Longitudinal Data 5 Experimental Results 6 Conclusions References A Permutation Solution to Compare Two Hepatocellular Carcinoma Markers 1 Introduction 2 The Data 3 The NPC Test Methodology 4 The Closed Testing Procedure 5 Diagnostic Tests 6 The Analysis 7 Final Remarks References Part III Integrating Administrative Data Statistical Perspective on Blocking Methods When Linking Large Data-sets 1 Introduction 2 Blocking Methods o 2.1 The Standard Blocking Method o 2.2 The Sorted Neighbourhood Method 3 A Statistical Perspective in Comparing Blocking Methods 4 Experimental Results 5 Concluding Remarks and Future Works References Integrating Households Income Microdata in the Estimate of the Italian GDP 1 Introduction 2 The Income Approach 3 How Much do Survey Microdata fit National Accounts? 4 Conclusions References The Employment Consequences of Globalization: Linking Data on Employers and Employees in the Netherlands 1 Introduction 2 Methodology: Creating a Linked Employer-Employee Dataset for the Netherlands 3 First Results o 3.1 Number of Employees o 3.2 Wages and Pay o 3.3 Employment Turnover 4 Conclusions and Further Research References Applications of Bayesian Networks in Official Statistics 1 Introduction 2 Background on Bayesian Networks 3 Use of Bayesian Networks in Survey Sampling o 3.1 Use of Bayesian Networks for Poststratification o 3.2 Use of Bayesian Networks for Integration 4 Use of Bayesian Networks for Imputation of Missing Data 5 Monitoring Data Production Process Using Bayesian Networks References Part IV Outliers and Missing Data A Correlated Random Effects Model for Longitudinal Data with Non-ignorable Drop-Out: An Application to University Student Performance 1 Introduction 2 Statistical Modeling 3 Data and Variables 4 Results 5 Conclusions References Risk Analysis Approaches to Rank Outliers in Trade Data 1 Introduction 2 Risk Analysis Framework 3 Approaches to Rank Low Price Outliers in Trade Data 4 Application of the Ranking Criteria 5 Final Remarks References Problems and Challenges in the Analysis of Complex Data: Static and Dynamic Approaches 1 Introduction 2 Some Difficulties in Data Analysis o 2.1 The Presence of Outliers o 2.2 Calibration of Test Procedures o 2.3 Subpopulations 3 An Example of Large Complex Corrupted Data 4 Forward Directions for the Forward Search o 4.1 Automatic Classification Procedures o 4.2 Timeliness and On-Line Systems o 4.3 Automatic Model Selection Procedures 5 Choosing Regression Models with Mallow's Cp o 5.1 Background and Aggregate Cp o 5.2 The Forward Search and Forward Cp 6 Credit Card Data o 6.1 Background and Aggregate Model Selection o 6.2 The Generalized Candlestick Plot o 6.3 Outlier Detection o 6.4 Model Building and Checking 7 Computation References Ensemble Support Vector Regression: A New Non-parametric Approach for Multiple Imputation 1 Introduction 2 Multiple Imputation Approaches 3 A New Approach Based on Ensemble Support Vector Regression 4 Results 5 Conclusion References Part V Time Series Analysis On the Use of PLS Regression for Forecasting Large Sets of Cointegrated Time Series 1 Introduction 2 The Model 3 Estimation of the Factors 4 The Data-Set 5 Estimation and Forecasting 6 Conclusions References Large-Scale Portfolio Optimisation with Heuristics 1 Overview 2 Methodology and Techniques o 2.1 A One-Period Model o 2.2 Techniques # 2.2.1 Differential Evolution # 2.2.2 Particle Swarm Optimisation # 2.2.3 Threshold Accepting o 2.3 Data and Computational Complexity 3 Convergence Results o 3.1 Minimising Squared Variation o 3.2 Minimising Losses 4 Conclusion References Detecting Short-Term Cycles in Complex Time Series Databases 1 Introduction 2 The Clustering Algorithm and the Tuning Parameters 3 A Proposal for the Selection of the Optimal Partition 4 The Results on the Observed Database of Time Series Energy Consumption References Assessing the Beneficial Effects of Economic Growth: The Harmonic Growth Index 1 Motivation 2 A Formal Definition of Harmonic Growth 3 The Harmonic Growth Index 4 Testing the Harmonic Growth Hypothesis: The Case of India o 4.1 Data o 4.2 Results 5 Conclusions References Time Series Convergence within I(ch20:eqn20.2) Models: the Case of Weekly Long Term Bond Yields in the Four Largest Euro Area Countries 1 Introduction 2 The Analysis Within the I(1) Model 3 The Analysis Within the I(2) Model 4 Conclusions References Part VI Environmental Statistics Anthropogenic CO2 Emissions and Global Warming: Evidence from Granger Causality Analysis 1 Introduction 2 The Datasets 3 Unit Root Testing 4 Testing for Granger Causality References Temporal and Spatial Statistical Methods to Remove External Effects on Groundwater Levels 1 Introduction 2 Data Analysis on a Monthly Scale o 2.1 Data Pre-processing o 2.2 Modelling the Effects of Neighboring Waterways via Transfer Function Models 3 Data Analysis on a Daily Scale o 3.1 The Al04 Study: Rain Predictions Based on Kriging o 3.2 Modelling the Joint Effects of Rain and Neighboring Rivers 4 Discussion References Reduced Rank Covariances for the Analysis of Environmental Data 1 Introduction 2 Modelling Observational Data 3 The Reduced Rank Covariance (RRC) Method o 3.1 Simulation Study 4 Applications o 4.1 Satellite Data o 4.2 Ozone Data 5 Conclusions and Further Developments References Radon Level in Dwellings and Uranium Content in Soil in the Abruzzo Region: A Preliminary Investigation by Geographically Weighted Regression 1 Introduction 2 Methodology 3 Data and Modelling Setting o 3.1 Geologic Setting of the Area o 3.2 Indoor Radon Data o 3.3 Soil Radiometric and Climate Data o 3.4 Data Processing and Model Setting 4 Results and Discussion References Part VII Probability and Density Estimation Applications of Large Deviations to HiddenMarkov Chains Estimation 1 Introduction 2 Study Framework 3 Construction of the Confidence Interval 4 Conclusions References Multivariate Tail Dependence Coefficients for Archimedean Copulae 1 Introduction 2 Archimedean Survival Copulae 3 Tail Dependence 4 MB Copula Functions o 4.1 MB1 Copula o 4.2 MB7 Copula 5 Concluding Remarks References A Note on Density Estimation for Circular Data 1 Introduction 2 Toroidal Kernels 3 Toroidal Density Estimation 4 Numerical Evidence References Markov Bases for Sudoku Grids 1 Introduction and Preliminary Material 2 Moves and Markov Bases for Sudoku Grids 3 The 4 4 Sudoku 4 Partially Filled 4 4 Grids 5 Further Developments References Part VIII Application in Economics Estimating the Probability of Moonlighting in ItalianBuilding Industry 1 Introduction 2 Creation of the Data Set and the Variables Used o 2.1 The Assessment of the Integrated Dataset 3 The Estimation of Probability of Moonlighting o 3.1 Non Parametric Models: CART o 3.2 Parametric Models 4 Conclusions and Further Work References Use of Interactive Plots and Tables for Robust Analysis of International Trade Data 1 Introduction 2 Main Forward Plots in Regression 3 Use of Dynamic Forward Plots in the Analysis of Trade Data 4 Use of More Complex Forward Plots 5 The Perspective of the Trade Analyst 6 Conclusion References Generational Determinants on the Employment Choice in Italy 1 Background and Introduction 2 Exploring Self-Employment in Italy: Data Sources and Variables 3 A Methodological View: A Latent Variable Model for Binary Outcomes o 3.1 A Set of Predictors for Individual Characteristics and as Proxy of the Different Forms of Capital and Regional Variations 4 Main Empirical Evidence and Effective Interpretation 5 Some Concluding Remarks and Further Developments References Route-Based Performance Evaluation Using Data Envelopment Analysis Combined with Principal Component Analysis 1 Introduction 2 Methods o 2.1 Data Envelopment Analysis (DEA) o 2.2 The PCA-DEA Formulation 3 Case Study o 3.1 Data o 3.2 Empirical Results 4 Conclusions and Future Research References Part IX WEB and Text Mining Web Surveys: Methodological Problems and Research Perspectives 1 Introduction 2 Questionnaire Design 3 Data Collection Modes, Recruitment and Inference 4 Bias in Web Surveys 5 Final Comments References Semantic Based DCM Models for Text Classification 1 Introduction 2 Dirichlet Compound Multinomial 3 A sbDCM Model 4 sbDCM with T Unknown 5 Semantic-based (DCM) with T Known in Advance o 5.1 The Reputational Risk o 5.2 Data Analysis 6 Conclusion References Probabilistic Relational Models for Operational Risk: A New Application Area and an Implementation Using Domain Ontologies 1 Introduction 2 Relational Domain Models 3 Probabilities in Relational Domain Models o 3.1 Case Studies 4 An Implementation Using a Domain Ontology Development and Runtime Environment 5 Conclusion References Part X Advances on Surveys Efficient Statistical Sample Designs in a GIS for Monitoring the Landscape Changes 1 Introduction 2 Stratification of the Area Frame with a GIS 3 Alternative Adaptive Sample Selection Procedures 4 Comparison of the Methods with Fixed Precision of the Estimate 5 Comparison of the Methods with Fixed Budget 6 Concluding Remarks References Studying Foreigners' Migration Flows Through a Network Analysis Approach 1 Aims 2 Source, Data and Territorial Grid 3 Methods of Analysis 4 Main Results 5 Conclusions References Estimation of Income Quantiles at the Small Area Level in Tuscany 1 Introduction 2 Small Area Methods for Poverty Estimates 3 Estimation of Household Income Quantiles in Tuscany 4 Conclusions References The Effects of Socioeconomic Background and Test-taking Motivation on Italian Students' Achievement 1 Introduction 2 The Italian Literacy Divide 3 The Student Test-Taking Motivation 4 Some Methodological Aspects o 4.1 Data Envelopment Analysis (DEA) o 4.2 The Multilevel Model 5 Main Results 6 Concluding Remarks References Part XI Multivariate Analysis Firm Size Dynamics in an Industrial District: The Mover-Stayer Model in Action 1 Introduction 2 The Model o 2.1 Estimation # 2.1.1 Goodman (1961) # 2.1.2 Frydman (1984) 3 Empirical Application o 3.1 Exploratory Analysis o 3.2 The Mover-Stayer o 3.3 Concentration 4 Conclusions References Multiple Correspondence Analysis for the Quantification and Visualization of Large Categorical Data Sets 1 Introduction 2 Multiple Correspondence Analysis of Data Chunks o 2.1 Computations of Multiple Correspondence Analysis o 2.2 Evolutionary MCA Solutions 3 Enhanced Methods for SVD o 3.1 Incremental SVD Procedure 4 Update of MCA-Like Results 5 Example of Application 6 Future Work References Multivariate Ranks-Based ConcordanceIndexes 1 An Introduction to Concordance Index Problem 2 Concordance Problem Analysis in a Multivariate Context o 2.1 Proposal: Multivariate Ranks-Based Approach o 2.2 Some Practical Results 3 Conclusion References Methods for Reconciling the Micro and the Macro in Family Demography Research: A Systematisation 1 The Need to Bridge the Gap Between the Micro and the Macro Perspectives in Family Demography Research 2 Methodological Individualism 3 Bridging the Macro-to-Micro Gap: Multi-Level Event-History Analyses 4 Bridging the Micro-to-Macro Gap: Meta-Analyses and Agent-Based Computational Models o 4.1 Meta-Analytic Techniques o 4.2 Agent-Based Computational Models 5 Towards an Empirical Implementation of the Theoretical Model: Implications for Data Collection and an Avenue for Future Research References
Download the book Advanced Statistical Methods for the Analysis of Large Data-Sets for free or read online
Read Download
Continue reading on any device:
QR code
Last viewed books
Related books
Comments (0)
reload, if the code cannot be seen