# DATA ANALYSIS AND STATISTICAL LEARNING

12 CFU - 1° and 2° Semester

### Teaching Staff

ANTONIO PUNZO - Module Data Analysis and Statistical Learning - SECS-S/01 - 6 CFU
Email: antonio.punzo@unict.it
Office: Palazzo delle Scienze, 3° piano, stanza 24
Phone: 0957536640
Office Hours: Venerdì 11:00-13:00
SALVATORE INGRASSIA - Module Statistical Learning - SECS-S/01 - 6 CFU
Email: s.ingrassia@unict.it
Office: Dipartimento di Economia e Impresa, Corso Italia 55 - 95129 Catania
Phone: 0957537732
Office Hours: da definire in base all'orario delle lezioni (vedi http://www.dei.unict.it/docenti/salvatore.ingrassia)

## Learning Objectives

• Data Analysis and Statistical Learning
1. Knowledge and understanding. The first “Statistical Learning” module mainly concerns the fundamentals of two of the main methods used in unsupervised learning: principal component analysis and cluster analysis.
2. Applying knowledge and understanding. On completion, the student will be able: i) to implement the main methods used in unsupervised learning; ii) to summarize the main features of a dataset and extract knowledge from data properly.
3. Making judgements. On completion, the student will be able to choose a suitable statistical model, apply it, and perform the analysis using a statistical software.
4. Communication skills. On completion, the student will be able to present the results from the statistical analysis, and which conclusions can be drawn.
5. Learning skills. On completion, the student will be able to understand the structure of unsupervised learning.
• Statistical Learning
1. Knowledge and understanding (Conoscenza e capacità di comprensione). The objectives of the module aim at acquiring knowledge about: i) setting of the learning problem and introducing the general model of the risk functional from empirical data; ii) main statistical learning techniques for regression and data classification.
2. Applying knowledge and understanding (Capacità di applicare conoscenza e comprensione). On completion, The student will be able: i) to implement main statistical models for supervised and unsupervised learning; ii) to summarize the main features of a dataset and extract knowledge from data properly.
3. Making judgements (Autonomia di giudizio). On completion, students will able how to choose a suitable statistical model, apply sound statistical methods, and perform the analyses using statistical software
4. Communication skills (Abilità comunicative). On completion, students will be able how to present the results from the statistical analyses, and which conclusions can be drawn from the analyses.
5. Learning skills (Capacità di apprendimento). On completion, students will be able to understand the structure of the statistical learning.

## Course Structure

• Data Analysis and Statistical Learning

Lectures via slides. The freely available R statistical software will be also used.

• Statistical Learning

Lectures and practical data modeling in R.

## Detailed Course Content

• Data Analysis and Statistical Learning

Statistical Models for Univariate Random Variables. Discrete and continuous random variables. basic distribution functions. Expectation and variance. Statistical models for random variables. Inference by maximum likelihood. Goodness-of-fit tests.

Basics of Multivariate Modelling. Random Vectors and Their Distributions, Standard Estimators of Covariance and Correlation, The Multivariate Normal Distribution.

Unsupervised Learning. Principal Component Analysis. Biplot. Cluster Analysis. Model-based clustering. Mixture models: mixtures of distributions and mixtures of regressions.

• Statistical Learning

Statistical Learning. Estimation of dependences based on empirical data. Supervised and Unsupervised Learning. Regression and Classification problems. Parametric and non-parametric models. Assessing Model Accuracy.

Linear Regression. Simple linear regression. Multiple linear regression. Least squares criterion and parameter estimation. Assessing the accuracy of the coefficient estimates and of the model. Use of qualitative predictors. Extension of the linear model and non-linear relationships.

Classification. Logistic regression; parameter estimation. Linear and quadratic discriminant analysis.

Linear Model Selection an Regularization. Variable selection. Dimension reduction methods.

Support Vector Machines and Neural Networks. Support vector classifiers. Deep learning and multilayer perceptrons.

Mixture models. Mixtures of distributions. Mixtures of regressions.

## Textbook Information

• Data Analysis and Statistical Learning
1. Hastie T., Tibshirani R., Friedman J. (2008). The Elements of Statistical Learning, Springer, New York.
2. James G., Witten D., Hastie T., Tibshirani R. (2017). An Introduction to Statistical Learning with Applications in R, Springer, New York.
3. McNeil A. J., Frey R., Embrechts P. (2005). Quantitative Risk Management Concepts, Techniques and Tools. Princeton University Press, Princeton, New Jersey.
4. Kassambara A. (2017). Practical Guide To Cluster Analysis in R - Unsupervised Machine Learning, STHDA.
• Statistical Learning
1. James G., Witten D., Hastie T., Tibshirani R. (2017). An Introduction to Statistical Learning with Applications in R, Springer, New York.
2. Hastie T., Tibshirani R., Friedman (2008). The Elements of Statistical Learning, Springer, New York
3. Course notes

Open in PDF format Versione in italiano