1. Knowledge and understanding (Conoscenza e capacità di comprensione). The first “Statistical Learning” module mainly concerns the fundamentals of two of the main methods used in unsupervised learning: principal component analysis and cluster analysis. 2. Applying knowledge and understanding (Capacità di applicare conoscenza e comprensione). On completion, the student will be able: i) to implement the main methods used in unsupervised learning; ii) to summarize the main features of a dataset and extract knowledge from data properly. 3. Making judgements (Autonomia di giudizio). On completion, the student will be able to choose a suitable statistical model, apply it, and perform the analysis using a statistical software. 4. Communication skills (Abilità comunicative). On completion, the student will be able to present the results from the statistical analysis, and which conclusions can be drawn. 5. Learning skills (Capacità di apprendimento). On completion, the student will be able to understand the structure of unsupervised learning. |
Lectures via slides. The freely available R statistical software will be also used.
Lectures and practical data modeling in R.
Basic notions in statistics, linear algebra, and computing.
Basics of statistics and data analysis, Matrix Algebra, Calculus
Recommended.
Mandatory
Statistical Learning. Estimation of dependences based on empirical data. Supervised and Unsupervised Learning. Regression and Classification problems. Parametric and non-parametric models. Assessing Model Accuracy.
Linear Regression. Simple linear regression. Multiple linear regression. Least squares criterion and parameter estimation. Assessing the accuracy of the coefficient estimates and of the model. Use of qualitative predictors. Extension of the linear model and non-linear relationships.
Classification. Logistic regression; parameter estimation. Linear and quadratic discriminant analysis.
Linear Model Selection an Regularization. Variable selection. Dimension reduction methods.
Support Vector Machines and Neural Networks. Support vector classifiers. Deep learning and multilayer perceptrons.
Mixture models. Mixtures of distributions. Mixtures of regressions.
http://studium.unict.it/dokeos/
See the website studium: http://studium.unict.it/dokeos/2019/
DATA ANALYSIS | ||
Argomenti | Riferimenti testi | |
1 | Statistical Models for Univariate Random Variables | Slides |
2 | Basics of Matrices | Bishop 2007, Appendix C |
3 | Basics of Multivariate Modelling | McNeil, Frey and Embrechts 2005, Chapter 3 |
4 | Principal Component Analysis (PCA) | James, Witten, Hastie, Tibshirani 2017, Chapter 10 |
5 | Cluster Analysis (CA) | Kassambara 2017, Chapter 3 |
6 | Hierarchical clustering methods | Kassambara 2017, Chapter 7 |
7 | Partitioning (or partitional) clustering methods | Kassambara 2017, Chapters 4–5 |
8 | Cluster Validation | Kassambara 2017, Chapters 11–14 |
9 | Model-Based Clustering | Kassambara 2017, Chapter 18 |
STATISTICAL LEARNING | ||
Argomenti | Riferimenti testi | |
1 | Fundamentals of Statistical Learning: Estimation of dependencies based on empirical data; supervised and unsupervised learning; regression and classification problems | Textbook #1: Chap 1 and Chap 2, Sect. 2.1 |
2 | Fundamentals of Statistical Learning: Parametric and non-parametric models; assessing model accuracy; Lab: introduction to R | Textbook #1: Chap 2, Sect. 2.2 and 2.3 |
3 | Linear Regression: simple linear regression and multiple linear regression; least squares criterion and parameter estimation | Textbook #1: Chap 3, Sect. 3.1 and 3.2 |
4 | Linear Regression: assessment of model fit; qualitative predictors; extension of the linear model and non-linear relationship; Lab with R | Textbook #1: Chap 2, Sect. 3.3 and 3.6 |
5 | Classification: logistic regression; linear and quadratic discriminant analysis; Lab with R | Textbook #1: Chap. 4 |
6 | Linear model selection and regularization. Variable selection; dimension reduction methods; Lab with R | Textbook #1: Chap. 6 |
7 | Support Vector Machines: support vector classifiers; lab with R | Textbook #1: Chap. 6. |
8 | Neural networks: deep learning and multilayer perceptrons; Lab with R | Course notes |
9 | Mixture models: mixtures of distributions; mixtures of regressions; Lab with R | Course notes |
The exam aims to evaluate the achievement of the learning objectives. It is carried out through an oral exam that includes questions related to the program in addition to the discussion of a report concerning a real data analysis peformed using both the methodologies treated during the course and the R statistical software.
Practical activities (data analysis and modeling with R) and oral exam